2024-05-01 09:20:50,874 - root - [INFO] - Starting Experiments of Model merging
2024-05-01 09:20:50,874 - root - [INFO] - python src/merging.py -c src/configs/ia3_base.json -i T0_held_out -m T0_held_out -f TPA-ES_10_1.8 --kwargs split=test project_name=merging experiment_name=tpa-es10_ia3_base_test
2024-05-01 09:20:50,875 - root - [INFO] - Loading pretrained model and tokenizer for bigscience/T0_3B
2024-05-01 09:21:17,403 - root - [INFO] - Starting Experiments of Model merging
2024-05-01 09:21:17,404 - root - [INFO] - python src/merging.py -c src/configs/ia3_base.json -i T0_held_out -m T0_held_out -f TPA-ES_10_1.8 --kwargs split=test project_name=merging experiment_name=tpa-es10_ia3_base_test
2024-05-01 09:21:17,404 - root - [INFO] - Loading pretrained model and tokenizer for bigscience/T0_3B
2024-05-01 09:21:21,257 - root - [INFO] - Loading Checkpoints for Merging: bigscience/T0_3B
2024-05-01 09:21:21,490 - root - [INFO] - Pretrained Model: checkpoints/T0_3B	Mean:1.000000	STD: 0.000000
2024-05-01 09:21:21,490 - root - [INFO] - Merging tasks: ['rte', 'cb', 'winogrande', 'wic', 'wsc', 'copa', 'h-swag', 'story_cloze', 'anli-r1', 'anli-r2', 'anli-r3']
2024-05-01 09:21:21,507 - root - [INFO] - Finetune checkpoint 	Mean:tensor([1.0001, 0.9995, 0.9991, 0.9974, 1.0000, 0.9999, 0.9911, 0.9997, 0.9974,
        0.9992, 0.9969])	STD: tensor([0.0158, 0.0623, 0.0501, 0.0977, 0.0016, 0.0160, 0.1273, 0.0180, 0.0990,
        0.0656, 0.1040])
2024-05-01 09:21:21,518 - root - [INFO] - Task Vector Finetune checkpoint	Mean:tensor([ 1.3487e-04, -5.4787e-04, -8.6915e-04, -2.5962e-03, -3.1090e-06,
        -1.2284e-04, -8.8976e-03, -2.5548e-04, -2.6217e-03, -7.7368e-04,
        -3.1030e-03])	STD: tensor([0.0158, 0.0623, 0.0501, 0.0977, 0.0016, 0.0160, 0.1273, 0.0180, 0.0990,
        0.0656, 0.1040])
2024-05-01 09:21:24,041 - root - [INFO] - Performing Merging with TPA-ES_10_1.8 and Searching Lambdas
2024-05-01 09:21:24,382 - root - [INFO] - Unexpected keys: []
2024-05-01 09:21:24,396 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 09:21:24,408 - root - [INFO] - 		Loading Full Data for rte
2024-05-01 09:21:25,135 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-667c257dd17a5fbb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 09:21:25,152 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 09:21:30,356 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:2450	Template Idx:-2	Num Templates:10	Num Examples with Template:2450
2024-05-01 09:21:47,043 - root - [INFO] - Loading Checkpoints for Merging: bigscience/T0_3B
2024-05-01 09:21:47,276 - root - [INFO] - Pretrained Model: checkpoints/T0_3B	Mean:1.000000	STD: 0.000000
2024-05-01 09:21:47,276 - root - [INFO] - Merging tasks: ['rte', 'cb', 'winogrande', 'wic', 'wsc', 'copa', 'h-swag', 'story_cloze', 'anli-r1', 'anli-r2', 'anli-r3']
2024-05-01 09:21:47,292 - root - [INFO] - Finetune checkpoint 	Mean:tensor([1.0001, 0.9995, 0.9991, 0.9974, 1.0000, 0.9999, 0.9911, 0.9997, 0.9974,
        0.9992, 0.9969])	STD: tensor([0.0158, 0.0623, 0.0501, 0.0977, 0.0016, 0.0160, 0.1273, 0.0180, 0.0990,
        0.0656, 0.1040])
2024-05-01 09:21:47,302 - root - [INFO] - Task Vector Finetune checkpoint	Mean:tensor([ 1.3487e-04, -5.4787e-04, -8.6915e-04, -2.5962e-03, -3.1090e-06,
        -1.2284e-04, -8.8976e-03, -2.5548e-04, -2.6217e-03, -7.7368e-04,
        -3.1030e-03])	STD: tensor([0.0158, 0.0623, 0.0501, 0.0977, 0.0016, 0.0160, 0.1273, 0.0180, 0.0990,
        0.0656, 0.1040])
2024-05-01 09:21:51,244 - root - [INFO] - Performing Merging with TPA-ES_10_1.8 and Searching Lambdas
2024-05-01 09:21:51,610 - root - [INFO] - Unexpected keys: []
2024-05-01 09:21:51,624 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 09:21:51,624 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/inference/test/rte_template_-2/evaluation_runs.json
2024-05-01 09:21:51,624 - root - [INFO] - Found cached evaluation run that crashed
2024-05-01 09:21:51,636 - root - [INFO] - 		Loading Full Data for rte
2024-05-01 09:22:02,369 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-667c257dd17a5fbb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 09:22:02,386 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 09:22:07,595 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:2450	Template Idx:-2	Num Templates:10	Num Examples with Template:2450
2024-05-01 09:22:59,512 - root - [INFO] - 	!!!Scores: {'accuracy': 0.788, 'average': 0.788}
2024-05-01 09:22:59,530 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:22:59,539 - root - [INFO] - 		Loading Full Data for cb
2024-05-01 09:23:00,433 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-c26e73467e3f215e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 09:23:00,442 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:23:01,288 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:360	Template Idx:-2	Num Templates:15	Num Examples with Template:360
2024-05-01 09:23:08,958 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 09:23:08,966 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 09:23:09,876 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-3ae7a9f52719d86a/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 09:23:09,931 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 09:23:28,044 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:6175	Template Idx:-2	Num Templates:5	Num Examples with Template:6175
2024-05-01 09:23:51,319 - root - [INFO] - Starting Experiments of Model merging
2024-05-01 09:23:51,320 - root - [INFO] - python src/merging.py -c src/configs/ia3_base.json -i T0_held_out -m T0_held_out -f TPA-ES_10_1.8 --multiple_prompts --kwargs split=test project_name=merging experiment_name=tpa-es10_ia3_base_test
2024-05-01 09:23:51,320 - root - [INFO] - Loading pretrained model and tokenizer for bigscience/T0_3B
2024-05-01 09:24:12,960 - root - [INFO] - 	!!!Scores: {'accuracy': 0.679, 'average': 0.679}
2024-05-01 09:24:12,972 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 09:24:12,982 - root - [INFO] - 		Loading Full Data for wic
2024-05-01 09:24:13,878 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-8f888d9273347dcb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 09:24:13,930 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 09:24:20,644 - root - [INFO] - Loading Checkpoints for Merging: bigscience/T0_3B
2024-05-01 09:24:20,877 - root - [INFO] - Pretrained Model: checkpoints/T0_3B	Mean:1.000000	STD: 0.000000
2024-05-01 09:24:20,878 - root - [INFO] - Merging tasks: ['rte', 'cb', 'winogrande', 'wic', 'wsc', 'copa', 'h-swag', 'story_cloze', 'anli-r1', 'anli-r2', 'anli-r3']
2024-05-01 09:24:20,895 - root - [INFO] - Finetune checkpoint 	Mean:tensor([1.0001, 0.9995, 0.9991, 0.9974, 1.0000, 0.9999, 0.9911, 0.9997, 0.9974,
        0.9992, 0.9969])	STD: tensor([0.0158, 0.0623, 0.0501, 0.0977, 0.0016, 0.0160, 0.1273, 0.0180, 0.0990,
        0.0656, 0.1040])
2024-05-01 09:24:20,906 - root - [INFO] - Task Vector Finetune checkpoint	Mean:tensor([ 1.3487e-04, -5.4787e-04, -8.6915e-04, -2.5962e-03, -3.1090e-06,
        -1.2284e-04, -8.8976e-03, -2.5548e-04, -2.6217e-03, -7.7368e-04,
        -3.1030e-03])	STD: tensor([0.0158, 0.0623, 0.0501, 0.0977, 0.0016, 0.0160, 0.1273, 0.0180, 0.0990,
        0.0656, 0.1040])
2024-05-01 09:24:23,683 - root - [INFO] - Performing Merging with TPA-ES_10_1.8 and Searching Lambdas
2024-05-01 09:24:24,024 - root - [INFO] - Unexpected keys: []
2024-05-01 09:24:24,178 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 09:24:24,191 - root - [INFO] - 		Loading Full Data for rte
2024-05-01 09:24:29,080 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:6060	Template Idx:-2	Num Templates:10	Num Examples with Template:6060
2024-05-01 09:24:34,897 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-667c257dd17a5fbb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 09:24:34,914 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 09:24:35,442 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:0	Num Templates:10	Num Examples with Template:245
2024-05-01 09:24:41,608 - root - [INFO] - 	!!!Scores: {'accuracy': 0.812, 'average': 0.812}
2024-05-01 09:24:41,608 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 09:24:41,618 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 09:24:42,141 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:1	Num Templates:10	Num Examples with Template:245
2024-05-01 09:24:47,533 - root - [INFO] - 	!!!Scores: {'accuracy': 0.796, 'average': 0.796}
2024-05-01 09:24:47,533 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 09:24:47,542 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 09:24:48,065 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:2	Num Templates:10	Num Examples with Template:245
2024-05-01 09:24:53,454 - root - [INFO] - 	!!!Scores: {'accuracy': 0.82, 'average': 0.82}
2024-05-01 09:24:53,454 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 09:24:53,463 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 09:24:53,983 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:3	Num Templates:10	Num Examples with Template:245
2024-05-01 09:24:59,307 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 09:24:59,308 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 09:24:59,316 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 09:24:59,834 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:4	Num Templates:10	Num Examples with Template:245
2024-05-01 09:25:05,229 - root - [INFO] - 	!!!Scores: {'accuracy': 0.8, 'average': 0.8}
2024-05-01 09:25:05,230 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 09:25:05,239 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 09:25:05,761 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:5	Num Templates:10	Num Examples with Template:245
2024-05-01 09:25:11,146 - root - [INFO] - 	!!!Scores: {'accuracy': 0.8, 'average': 0.8}
2024-05-01 09:25:11,146 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 09:25:11,155 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 09:25:11,678 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:6	Num Templates:10	Num Examples with Template:245
2024-05-01 09:25:16,757 - root - [INFO] - 	!!!Scores: {'accuracy': 0.599, 'average': 0.599}
2024-05-01 09:25:16,769 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 09:25:16,779 - root - [INFO] - 		Loading Full Data for wsc
2024-05-01 09:25:17,002 - root - [INFO] - 	!!!Scores: {'accuracy': 0.8, 'average': 0.8}
2024-05-01 09:25:17,002 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 09:25:17,011 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 09:25:17,529 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:7	Num Templates:10	Num Examples with Template:245
2024-05-01 09:25:23,054 - root - [INFO] - 	!!!Scores: {'accuracy': 0.784, 'average': 0.784}
2024-05-01 09:25:23,054 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 09:25:23,061 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 09:25:23,579 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:8	Num Templates:10	Num Examples with Template:245
2024-05-01 09:25:27,482 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-4bcfdb1f33f6e278/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 09:25:27,498 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 09:25:28,970 - root - [INFO] - 	!!!Scores: {'accuracy': 0.804, 'average': 0.804}
2024-05-01 09:25:28,970 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 09:25:28,979 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 09:25:29,503 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:9	Num Templates:10	Num Examples with Template:245
2024-05-01 09:25:29,572 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:720	Template Idx:-2	Num Templates:10	Num Examples with Template:720
2024-05-01 09:25:34,991 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 09:25:34,991 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:25:34,999 - root - [INFO] - 		Loading Full Data for cb
2024-05-01 09:25:38,232 - root - [INFO] - 	!!!Scores: {'accuracy': 0.583, 'average': 0.583}
2024-05-01 09:25:38,247 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 09:25:38,256 - root - [INFO] - 		Loading Full Data for copa
2024-05-01 09:25:45,713 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-c26e73467e3f215e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 09:25:45,723 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:25:45,788 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:0	Num Templates:15	Num Examples with Template:24
2024-05-01 09:25:47,229 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 09:25:47,229 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:25:47,237 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:25:47,287 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:1	Num Templates:15	Num Examples with Template:24
2024-05-01 09:25:48,731 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 09:25:48,731 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:25:48,740 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:25:48,805 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:2	Num Templates:15	Num Examples with Template:24
2024-05-01 09:25:48,991 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-75ee279101665e7b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 09:25:49,005 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 09:25:50,270 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 09:25:50,270 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:25:50,277 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:25:50,329 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:3	Num Templates:15	Num Examples with Template:24
2024-05-01 09:25:50,710 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:544	Template Idx:-2	Num Templates:8	Num Examples with Template:544
2024-05-01 09:25:51,759 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 09:25:51,760 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:25:51,767 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:25:51,819 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:4	Num Templates:15	Num Examples with Template:24
2024-05-01 09:25:53,257 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 09:25:53,257 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:25:53,266 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:25:53,332 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:5	Num Templates:15	Num Examples with Template:24
2024-05-01 09:25:54,778 - root - [INFO] - 	!!!Scores: {'accuracy': 0.917, 'average': 0.917}
2024-05-01 09:25:54,779 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:25:54,786 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:25:54,838 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:6	Num Templates:15	Num Examples with Template:24
2024-05-01 09:25:56,023 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 09:25:56,037 - root - [INFO] - 	Evaluating model on h-swag dataset
2024-05-01 09:25:56,046 - root - [INFO] - 		Loading Full Data for h-swag
2024-05-01 09:25:56,272 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 09:25:56,272 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:25:56,279 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:25:56,343 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:7	Num Templates:15	Num Examples with Template:24
2024-05-01 09:25:57,790 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 09:25:57,790 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:25:57,799 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:25:57,851 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:8	Num Templates:15	Num Examples with Template:24
2024-05-01 09:25:59,297 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 09:25:59,297 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:25:59,305 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:25:59,357 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:9	Num Templates:15	Num Examples with Template:24
2024-05-01 09:26:00,799 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 09:26:00,799 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:26:00,806 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:26:00,871 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:10	Num Templates:15	Num Examples with Template:24
2024-05-01 09:26:02,323 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 09:26:02,323 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:26:02,330 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:26:02,381 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:11	Num Templates:15	Num Examples with Template:24
2024-05-01 09:26:03,819 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 09:26:03,819 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:26:03,826 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:26:03,877 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:12	Num Templates:15	Num Examples with Template:24
2024-05-01 09:26:05,361 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 09:26:05,361 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:26:05,370 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:26:05,423 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:13	Num Templates:15	Num Examples with Template:24
2024-05-01 09:26:06,749 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-eac19eb96b26d7b6/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 09:26:06,862 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 09:26:06,863 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:26:06,870 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:26:06,935 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:14	Num Templates:15	Num Examples with Template:24
2024-05-01 09:26:07,438 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Selected Examples: 10010	Num Total Example:10010
2024-05-01 09:26:08,410 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 09:26:08,410 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 09:26:19,125 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-3ae7a9f52719d86a/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 09:26:19,181 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 09:26:22,811 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:0	Num Templates:5	Num Examples with Template:1235
2024-05-01 09:26:29,405 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Num Selected Example with Templates:10010	Template Idx:-2	Num Templates:1	Num Examples with Template:10010
2024-05-01 09:26:32,392 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 09:26:32,392 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 09:26:32,401 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 09:26:35,956 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:1	Num Templates:5	Num Examples with Template:1235
2024-05-01 09:26:45,615 - root - [INFO] - 	!!!Scores: {'accuracy': 0.669, 'average': 0.669}
2024-05-01 09:26:45,615 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 09:26:45,624 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 09:26:49,234 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:2	Num Templates:5	Num Examples with Template:1235
2024-05-01 09:26:58,958 - root - [INFO] - 	!!!Scores: {'accuracy': 0.688, 'average': 0.688}
2024-05-01 09:26:58,959 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 09:26:58,968 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 09:27:02,610 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:3	Num Templates:5	Num Examples with Template:1235
2024-05-01 09:27:12,673 - root - [INFO] - 	!!!Scores: {'accuracy': 0.674, 'average': 0.674}
2024-05-01 09:27:12,674 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 09:27:12,682 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 09:27:16,297 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:4	Num Templates:5	Num Examples with Template:1235
2024-05-01 09:27:26,237 - root - [INFO] - 	!!!Scores: {'accuracy': 0.676, 'average': 0.676}
2024-05-01 09:27:26,238 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 09:27:26,246 - root - [INFO] - 		Loading Full Data for wic
2024-05-01 09:27:36,958 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-8f888d9273347dcb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 09:27:37,010 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 09:27:38,516 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:0	Num Templates:10	Num Examples with Template:606
2024-05-01 09:27:43,832 - root - [INFO] - 	!!!Scores: {'accuracy': 0.645, 'average': 0.645}
2024-05-01 09:27:43,832 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 09:27:43,841 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 09:27:45,336 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:1	Num Templates:10	Num Examples with Template:606
2024-05-01 09:27:50,510 - root - [INFO] - 	!!!Scores: {'accuracy': 0.658, 'average': 0.658}
2024-05-01 09:27:50,510 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 09:27:50,519 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 09:27:52,022 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:2	Num Templates:10	Num Examples with Template:606
2024-05-01 09:27:57,705 - root - [INFO] - 	!!!Scores: {'accuracy': 0.645, 'average': 0.645}
2024-05-01 09:27:57,706 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 09:27:57,714 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 09:27:59,339 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:3	Num Templates:10	Num Examples with Template:606
2024-05-01 09:28:05,154 - root - [INFO] - 	!!!Scores: {'accuracy': 0.538, 'average': 0.538}
2024-05-01 09:28:05,154 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 09:28:05,163 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 09:28:06,656 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:4	Num Templates:10	Num Examples with Template:606
2024-05-01 09:28:12,049 - root - [INFO] - 	!!!Scores: {'accuracy': 0.545, 'average': 0.545}
2024-05-01 09:28:12,049 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 09:28:12,058 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 09:28:13,573 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:5	Num Templates:10	Num Examples with Template:606
2024-05-01 09:28:19,400 - root - [INFO] - 	!!!Scores: {'accuracy': 0.655, 'average': 0.655}
2024-05-01 09:28:19,400 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 09:28:19,409 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 09:28:20,908 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:6	Num Templates:10	Num Examples with Template:606
2024-05-01 09:28:26,309 - root - [INFO] - 	!!!Scores: {'accuracy': 0.665, 'average': 0.665}
2024-05-01 09:28:26,309 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 09:28:26,318 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 09:28:27,807 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:7	Num Templates:10	Num Examples with Template:606
2024-05-01 09:28:33,364 - root - [INFO] - 	!!!Scores: {'accuracy': 0.523, 'average': 0.523}
2024-05-01 09:28:33,365 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 09:28:33,373 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 09:28:34,874 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:8	Num Templates:10	Num Examples with Template:606
2024-05-01 09:28:40,943 - root - [INFO] - 	!!!Scores: {'accuracy': 0.624, 'average': 0.624}
2024-05-01 09:28:40,943 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 09:28:40,951 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 09:28:42,431 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:9	Num Templates:10	Num Examples with Template:606
2024-05-01 09:28:47,154 - root - [INFO] - 	!!!Scores: {'accuracy': 0.652, 'average': 0.652}
2024-05-01 09:28:47,154 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 09:28:47,163 - root - [INFO] - 		Loading Full Data for wsc
2024-05-01 09:28:48,050 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-4bcfdb1f33f6e278/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 09:28:48,067 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 09:28:48,259 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:0	Num Templates:10	Num Examples with Template:72
2024-05-01 09:28:50,049 - root - [INFO] - 	!!!Scores: {'accuracy': 0.597, 'average': 0.597}
2024-05-01 09:28:50,049 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 09:28:50,056 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 09:28:50,232 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:1	Num Templates:10	Num Examples with Template:72
2024-05-01 09:28:51,996 - root - [INFO] - 	!!!Scores: {'accuracy': 0.569, 'average': 0.569}
2024-05-01 09:28:51,996 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 09:28:52,004 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 09:28:52,222 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:2	Num Templates:10	Num Examples with Template:72
2024-05-01 09:28:54,148 - root - [INFO] - 	!!!Scores: {'accuracy': 0.708, 'average': 0.708}
2024-05-01 09:28:54,149 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 09:28:54,157 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 09:28:54,377 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:3	Num Templates:10	Num Examples with Template:72
2024-05-01 09:28:56,305 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 09:28:56,306 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 09:28:56,313 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 09:28:56,498 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:4	Num Templates:10	Num Examples with Template:72
2024-05-01 09:28:58,252 - root - [INFO] - 	!!!Scores: {'accuracy': 0.569, 'average': 0.569}
2024-05-01 09:28:58,253 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 09:28:58,261 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 09:28:58,438 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:5	Num Templates:10	Num Examples with Template:72
2024-05-01 09:29:00,234 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 09:29:00,234 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 09:29:00,242 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 09:29:00,417 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:6	Num Templates:10	Num Examples with Template:72
2024-05-01 09:29:02,228 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 09:29:02,229 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 09:29:02,237 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 09:29:02,515 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:7	Num Templates:10	Num Examples with Template:72
2024-05-01 09:29:04,282 - root - [INFO] - 	!!!Scores: {'accuracy': 0.528, 'average': 0.528}
2024-05-01 09:29:04,282 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 09:29:04,289 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 09:29:04,462 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:8	Num Templates:10	Num Examples with Template:72
2024-05-01 09:29:06,259 - root - [INFO] - 	!!!Scores: {'accuracy': 0.611, 'average': 0.611}
2024-05-01 09:29:06,259 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 09:29:06,267 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 09:29:06,562 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:9	Num Templates:10	Num Examples with Template:72
2024-05-01 09:29:08,321 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 09:29:08,322 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 09:29:08,329 - root - [INFO] - 		Loading Full Data for copa
2024-05-01 09:29:09,226 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-75ee279101665e7b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 09:29:09,241 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 09:29:09,460 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:0	Num Templates:8	Num Examples with Template:68
2024-05-01 09:29:11,049 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 09:29:11,049 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 09:29:11,057 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 09:29:11,268 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:1	Num Templates:8	Num Examples with Template:68
2024-05-01 09:29:12,885 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 09:29:12,885 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 09:29:12,892 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 09:29:13,107 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:2	Num Templates:8	Num Examples with Template:68
2024-05-01 09:29:14,706 - root - [INFO] - 	!!!Scores: {'accuracy': 0.926, 'average': 0.926}
2024-05-01 09:29:14,706 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 09:29:14,715 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 09:29:14,938 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:3	Num Templates:8	Num Examples with Template:68
2024-05-01 09:29:16,468 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 09:29:16,468 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 09:29:16,475 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 09:29:16,683 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:4	Num Templates:8	Num Examples with Template:68
2024-05-01 09:29:18,283 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 09:29:18,283 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 09:29:18,291 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 09:29:18,504 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:5	Num Templates:8	Num Examples with Template:68
2024-05-01 09:29:20,108 - root - [INFO] - 	!!!Scores: {'accuracy': 0.868, 'average': 0.868}
2024-05-01 09:29:20,108 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 09:29:20,115 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 09:29:20,326 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:6	Num Templates:8	Num Examples with Template:68
2024-05-01 09:29:21,885 - root - [INFO] - 	!!!Scores: {'accuracy': 0.853, 'average': 0.853}
2024-05-01 09:29:21,885 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 09:29:21,893 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 09:29:22,102 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:7	Num Templates:8	Num Examples with Template:68
2024-05-01 09:29:23,657 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 09:29:23,657 - root - [INFO] - 	Evaluating model on h-swag dataset
2024-05-01 09:29:23,665 - root - [INFO] - 		Loading Full Data for h-swag
2024-05-01 09:29:24,573 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-eac19eb96b26d7b6/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 09:29:25,273 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Selected Examples: 10010	Num Total Example:10010
2024-05-01 09:29:47,223 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Num Selected Example with Templates:10010	Template Idx:0	Num Templates:1	Num Examples with Template:10010
2024-05-01 09:35:00,387 - root - [INFO] - 	!!!Scores: {'accuracy': 0.427, 'average': 0.427}
2024-05-01 09:35:00,387 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 09:35:01,098 - datasets.builder - [WARNING] - Found cached dataset arrow (/home/guodong/.cache/huggingface/datasets/arrow/default-2796e0ea53bfdf30/0.0.0/74f69db2c14c2860059d39860b1f400a03d11bf7fb5a8258ca38c501c878c137)
2024-05-01 09:35:01,210 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 09:35:07,235 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:0	Num Templates:5	Num Examples with Template:1839
2024-05-01 09:35:31,580 - root - [INFO] - 	!!!Scores: {'accuracy': 0.929, 'average': 0.929}
2024-05-01 09:35:31,581 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 09:35:31,590 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 09:35:37,672 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:1	Num Templates:5	Num Examples with Template:1839
2024-05-01 09:36:02,420 - root - [INFO] - 	!!!Scores: {'accuracy': 0.929, 'average': 0.929}
2024-05-01 09:36:02,420 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 09:36:02,429 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 09:36:08,658 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:2	Num Templates:5	Num Examples with Template:1839
2024-05-01 09:36:33,239 - root - [INFO] - 	!!!Scores: {'accuracy': 0.929, 'average': 0.929}
2024-05-01 09:36:33,239 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 09:36:33,248 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 09:36:39,356 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:3	Num Templates:5	Num Examples with Template:1839
2024-05-01 09:37:03,966 - root - [INFO] - 	!!!Scores: {'accuracy': 0.922, 'average': 0.922}
2024-05-01 09:37:03,966 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 09:37:03,975 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 09:37:10,020 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:4	Num Templates:5	Num Examples with Template:1839
2024-05-01 09:37:35,297 - root - [INFO] - 	!!!Scores: {'accuracy': 0.923, 'average': 0.923}
2024-05-01 09:37:35,298 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 09:37:35,989 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-9d580cad42f0d988/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 09:37:36,041 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:37:37,865 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:37:56,377 - root - [INFO] - 	!!!Scores: {'accuracy': 0.626, 'average': 0.626}
2024-05-01 09:37:56,378 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 09:37:56,387 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:37:58,240 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:38:14,792 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 09:38:14,793 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 09:38:14,802 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:38:16,637 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:38:32,894 - root - [INFO] - 	!!!Scores: {'accuracy': 0.685, 'average': 0.685}
2024-05-01 09:38:32,894 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 09:38:32,903 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:38:34,719 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:38:51,151 - root - [INFO] - 	!!!Scores: {'accuracy': 0.67, 'average': 0.67}
2024-05-01 09:38:51,151 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 09:38:51,160 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:38:52,975 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:39:09,757 - root - [INFO] - 	!!!Scores: {'accuracy': 0.688, 'average': 0.688}
2024-05-01 09:39:09,758 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 09:39:09,767 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:39:11,598 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:39:28,063 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 09:39:28,063 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 09:39:28,072 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:39:30,456 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:39:48,590 - root - [INFO] - 	!!!Scores: {'accuracy': 0.659, 'average': 0.659}
2024-05-01 09:39:48,591 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 09:39:48,600 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:39:50,449 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:40:07,151 - root - [INFO] - 	!!!Scores: {'accuracy': 0.673, 'average': 0.673}
2024-05-01 09:40:07,151 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 09:40:07,161 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:40:09,008 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:40:25,527 - root - [INFO] - 	!!!Scores: {'accuracy': 0.671, 'average': 0.671}
2024-05-01 09:40:25,527 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 09:40:25,535 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:40:27,924 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:40:45,089 - root - [INFO] - 	!!!Scores: {'accuracy': 0.622, 'average': 0.622}
2024-05-01 09:40:45,089 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 09:40:45,098 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:40:47,475 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:41:04,319 - root - [INFO] - 	!!!Scores: {'accuracy': 0.658, 'average': 0.658}
2024-05-01 09:41:04,320 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 09:41:04,328 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:41:06,138 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:41:22,667 - root - [INFO] - 	!!!Scores: {'accuracy': 0.68, 'average': 0.68}
2024-05-01 09:41:22,667 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 09:41:22,676 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:41:25,043 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:41:41,909 - root - [INFO] - 	!!!Scores: {'accuracy': 0.633, 'average': 0.633}
2024-05-01 09:41:41,909 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 09:41:41,916 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:41:44,276 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:42:02,017 - root - [INFO] - 	!!!Scores: {'accuracy': 0.668, 'average': 0.668}
2024-05-01 09:42:02,017 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 09:42:02,027 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:42:03,877 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:42:20,404 - root - [INFO] - 	!!!Scores: {'accuracy': 0.676, 'average': 0.676}
2024-05-01 09:42:20,405 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 09:42:21,116 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-2f5e309763d7971f/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 09:42:21,168 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:42:22,975 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:42:40,930 - root - [INFO] - 	!!!Scores: {'accuracy': 0.498, 'average': 0.498}
2024-05-01 09:42:40,930 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 09:42:40,939 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:42:42,798 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:42:59,051 - root - [INFO] - 	!!!Scores: {'accuracy': 0.524, 'average': 0.524}
2024-05-01 09:42:59,052 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 09:42:59,061 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:43:00,890 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:43:16,753 - root - [INFO] - 	!!!Scores: {'accuracy': 0.532, 'average': 0.532}
2024-05-01 09:43:16,753 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 09:43:16,763 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:43:18,580 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:43:34,626 - root - [INFO] - 	!!!Scores: {'accuracy': 0.49, 'average': 0.49}
2024-05-01 09:43:34,626 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 09:43:34,635 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:43:36,439 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:43:52,904 - root - [INFO] - 	!!!Scores: {'accuracy': 0.513, 'average': 0.513}
2024-05-01 09:43:52,905 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 09:43:52,914 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:43:54,749 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:44:10,861 - root - [INFO] - 	!!!Scores: {'accuracy': 0.52, 'average': 0.52}
2024-05-01 09:44:10,861 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 09:44:10,870 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:44:13,259 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:44:30,904 - root - [INFO] - 	!!!Scores: {'accuracy': 0.508, 'average': 0.508}
2024-05-01 09:44:30,904 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 09:44:30,914 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:44:32,771 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:44:49,171 - root - [INFO] - 	!!!Scores: {'accuracy': 0.513, 'average': 0.513}
2024-05-01 09:44:49,172 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 09:44:49,181 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:44:51,030 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:45:07,269 - root - [INFO] - 	!!!Scores: {'accuracy': 0.514, 'average': 0.514}
2024-05-01 09:45:07,269 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 09:45:07,277 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:45:09,666 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:45:26,522 - root - [INFO] - 	!!!Scores: {'accuracy': 0.494, 'average': 0.494}
2024-05-01 09:45:26,522 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 09:45:26,531 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:45:28,910 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:45:45,478 - root - [INFO] - 	!!!Scores: {'accuracy': 0.499, 'average': 0.499}
2024-05-01 09:45:45,478 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 09:45:45,486 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:45:47,303 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:46:03,541 - root - [INFO] - 	!!!Scores: {'accuracy': 0.514, 'average': 0.514}
2024-05-01 09:46:03,542 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 09:46:03,551 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:46:05,921 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:46:22,458 - root - [INFO] - 	!!!Scores: {'accuracy': 0.498, 'average': 0.498}
2024-05-01 09:46:22,458 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 09:46:22,466 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:46:24,828 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:46:42,204 - root - [INFO] - 	!!!Scores: {'accuracy': 0.505, 'average': 0.505}
2024-05-01 09:46:42,205 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 09:46:42,214 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 09:46:44,069 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 09:47:00,305 - root - [INFO] - 	!!!Scores: {'accuracy': 0.518, 'average': 0.518}
2024-05-01 09:47:00,305 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 09:47:00,995 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-5854c73e8e2a58bf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 09:47:01,054 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 09:47:03,234 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:0	Num Templates:15	Num Examples with Template:1200
2024-05-01 09:47:28,566 - root - [INFO] - 	!!!Scores: {'accuracy': 0.499, 'average': 0.499}
2024-05-01 09:47:28,566 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 09:47:28,575 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 09:47:30,805 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:1	Num Templates:15	Num Examples with Template:1200
2024-05-01 09:47:53,675 - root - [INFO] - 	!!!Scores: {'accuracy': 0.511, 'average': 0.511}
2024-05-01 09:47:53,676 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 09:47:53,685 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 09:47:55,881 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:2	Num Templates:15	Num Examples with Template:1200
2024-05-01 09:48:18,373 - root - [INFO] - 	!!!Scores: {'accuracy': 0.486, 'average': 0.486}
2024-05-01 09:48:18,374 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 09:48:18,383 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 09:48:20,553 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:3	Num Templates:15	Num Examples with Template:1200
2024-05-01 09:48:43,443 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 09:48:43,443 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 09:48:43,452 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 09:48:45,645 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:4	Num Templates:15	Num Examples with Template:1200
2024-05-01 09:49:08,890 - root - [INFO] - 	!!!Scores: {'accuracy': 0.496, 'average': 0.496}
2024-05-01 09:49:08,891 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 09:49:08,900 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 09:49:11,105 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:5	Num Templates:15	Num Examples with Template:1200
2024-05-01 09:49:33,858 - root - [INFO] - 	!!!Scores: {'accuracy': 0.503, 'average': 0.503}
2024-05-01 09:49:33,858 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 09:49:33,868 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 09:49:36,739 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:6	Num Templates:15	Num Examples with Template:1200
2024-05-01 09:50:01,610 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 09:50:01,610 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 09:50:01,619 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 09:50:03,847 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:7	Num Templates:15	Num Examples with Template:1200
2024-05-01 09:50:26,988 - root - [INFO] - 	!!!Scores: {'accuracy': 0.488, 'average': 0.488}
2024-05-01 09:50:26,988 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 09:50:26,998 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 09:50:29,236 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:8	Num Templates:15	Num Examples with Template:1200
2024-05-01 09:50:52,081 - root - [INFO] - 	!!!Scores: {'accuracy': 0.483, 'average': 0.483}
2024-05-01 09:50:52,081 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 09:50:52,089 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 09:50:54,962 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:9	Num Templates:15	Num Examples with Template:1200
2024-05-01 09:51:18,689 - root - [INFO] - 	!!!Scores: {'accuracy': 0.487, 'average': 0.487}
2024-05-01 09:51:18,689 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 09:51:18,698 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 09:51:21,561 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:10	Num Templates:15	Num Examples with Template:1200
2024-05-01 09:51:44,918 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 09:51:44,918 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 09:51:44,926 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 09:51:47,103 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:11	Num Templates:15	Num Examples with Template:1200
2024-05-01 09:52:09,955 - root - [INFO] - 	!!!Scores: {'accuracy': 0.493, 'average': 0.493}
2024-05-01 09:52:09,955 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 09:52:09,964 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 09:52:12,808 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:12	Num Templates:15	Num Examples with Template:1200
2024-05-01 09:52:36,171 - root - [INFO] - 	!!!Scores: {'accuracy': 0.513, 'average': 0.513}
2024-05-01 09:52:36,171 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 09:52:36,179 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 09:52:39,014 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:13	Num Templates:15	Num Examples with Template:1200
2024-05-01 09:53:03,409 - root - [INFO] - 	!!!Scores: {'accuracy': 0.506, 'average': 0.506}
2024-05-01 09:53:03,409 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 09:53:03,417 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 09:53:05,643 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:14	Num Templates:15	Num Examples with Template:1200
2024-05-01 09:53:28,493 - root - [INFO] - 	!!!Scores: {'accuracy': 0.495, 'average': 0.495}
2024-05-01 09:53:28,563 - root - [INFO] - Unexpected keys: []
2024-05-01 09:53:28,793 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 09:53:28,794 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_0/evaluation_runs.json
2024-05-01 09:53:28,794 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:53:28,803 - root - [INFO] - 		Loading Full Data for rte
2024-05-01 09:53:29,721 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-667c257dd17a5fbb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 09:53:29,745 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 09:53:30,263 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:0	Num Templates:10	Num Examples with Template:245
2024-05-01 09:53:36,029 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 09:53:36,030 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 09:53:36,030 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_1/evaluation_runs.json
2024-05-01 09:53:36,030 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:53:36,040 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 09:53:36,560 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:1	Num Templates:10	Num Examples with Template:245
2024-05-01 09:53:42,017 - root - [INFO] - 	!!!Scores: {'accuracy': 0.8, 'average': 0.8}
2024-05-01 09:53:42,017 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 09:53:42,017 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_2/evaluation_runs.json
2024-05-01 09:53:42,018 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:53:42,026 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 09:53:42,546 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:2	Num Templates:10	Num Examples with Template:245
2024-05-01 09:53:48,001 - root - [INFO] - 	!!!Scores: {'accuracy': 0.82, 'average': 0.82}
2024-05-01 09:53:48,001 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 09:53:48,002 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_3/evaluation_runs.json
2024-05-01 09:53:48,002 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:53:48,010 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 09:53:48,525 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:3	Num Templates:10	Num Examples with Template:245
2024-05-01 09:53:53,904 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 09:53:53,904 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 09:53:53,905 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_4/evaluation_runs.json
2024-05-01 09:53:53,905 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:53:53,913 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 09:53:54,429 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:4	Num Templates:10	Num Examples with Template:245
2024-05-01 09:53:59,883 - root - [INFO] - 	!!!Scores: {'accuracy': 0.8, 'average': 0.8}
2024-05-01 09:53:59,883 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 09:53:59,883 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_5/evaluation_runs.json
2024-05-01 09:53:59,883 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:53:59,892 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 09:54:00,412 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:5	Num Templates:10	Num Examples with Template:245
2024-05-01 09:54:05,864 - root - [INFO] - 	!!!Scores: {'accuracy': 0.804, 'average': 0.804}
2024-05-01 09:54:05,864 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 09:54:05,864 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_6/evaluation_runs.json
2024-05-01 09:54:05,864 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:54:05,873 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 09:54:06,393 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:6	Num Templates:10	Num Examples with Template:245
2024-05-01 09:54:11,779 - root - [INFO] - 	!!!Scores: {'accuracy': 0.796, 'average': 0.796}
2024-05-01 09:54:11,780 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 09:54:11,780 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_7/evaluation_runs.json
2024-05-01 09:54:11,780 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:54:11,788 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 09:54:12,304 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:7	Num Templates:10	Num Examples with Template:245
2024-05-01 09:54:17,888 - root - [INFO] - 	!!!Scores: {'accuracy': 0.788, 'average': 0.788}
2024-05-01 09:54:17,889 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 09:54:17,889 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_8/evaluation_runs.json
2024-05-01 09:54:17,889 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:54:17,896 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 09:54:18,412 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:8	Num Templates:10	Num Examples with Template:245
2024-05-01 09:54:23,857 - root - [INFO] - 	!!!Scores: {'accuracy': 0.804, 'average': 0.804}
2024-05-01 09:54:23,857 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 09:54:23,857 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_9/evaluation_runs.json
2024-05-01 09:54:23,857 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:54:23,866 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 09:54:24,385 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:9	Num Templates:10	Num Examples with Template:245
2024-05-01 09:54:29,924 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 09:54:29,924 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:54:29,924 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_0/evaluation_runs.json
2024-05-01 09:54:29,924 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:54:29,932 - root - [INFO] - 		Loading Full Data for cb
2024-05-01 09:54:30,856 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-c26e73467e3f215e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 09:54:30,865 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:54:30,926 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:0	Num Templates:15	Num Examples with Template:24
2024-05-01 09:54:32,374 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 09:54:32,374 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:54:32,374 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_1/evaluation_runs.json
2024-05-01 09:54:32,374 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:54:32,383 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:54:32,434 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:1	Num Templates:15	Num Examples with Template:24
2024-05-01 09:54:33,881 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 09:54:33,881 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:54:33,881 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_2/evaluation_runs.json
2024-05-01 09:54:33,881 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:54:33,889 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:54:33,953 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:2	Num Templates:15	Num Examples with Template:24
2024-05-01 09:54:35,421 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 09:54:35,421 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:54:35,421 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_3/evaluation_runs.json
2024-05-01 09:54:35,422 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:54:35,430 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:54:35,482 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:3	Num Templates:15	Num Examples with Template:24
2024-05-01 09:54:36,918 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 09:54:36,919 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:54:36,919 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_4/evaluation_runs.json
2024-05-01 09:54:36,919 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:54:36,926 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:54:36,977 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:4	Num Templates:15	Num Examples with Template:24
2024-05-01 09:54:38,418 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 09:54:38,418 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:54:38,418 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_5/evaluation_runs.json
2024-05-01 09:54:38,418 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:54:38,425 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:54:38,490 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:5	Num Templates:15	Num Examples with Template:24
2024-05-01 09:54:39,939 - root - [INFO] - 	!!!Scores: {'accuracy': 0.917, 'average': 0.917}
2024-05-01 09:54:39,939 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:54:39,939 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_6/evaluation_runs.json
2024-05-01 09:54:39,939 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:54:39,946 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:54:39,998 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:6	Num Templates:15	Num Examples with Template:24
2024-05-01 09:54:41,438 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 09:54:41,439 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:54:41,439 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_7/evaluation_runs.json
2024-05-01 09:54:41,439 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:54:41,447 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:54:41,512 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:7	Num Templates:15	Num Examples with Template:24
2024-05-01 09:54:42,965 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 09:54:42,965 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:54:42,965 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_8/evaluation_runs.json
2024-05-01 09:54:42,965 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:54:42,972 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:54:43,024 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:8	Num Templates:15	Num Examples with Template:24
2024-05-01 09:54:44,466 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 09:54:44,466 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:54:44,467 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_9/evaluation_runs.json
2024-05-01 09:54:44,467 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:54:44,474 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:54:44,525 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:9	Num Templates:15	Num Examples with Template:24
2024-05-01 09:54:45,971 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 09:54:45,971 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:54:45,971 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_10/evaluation_runs.json
2024-05-01 09:54:45,971 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:54:45,978 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:54:46,043 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:10	Num Templates:15	Num Examples with Template:24
2024-05-01 09:54:47,502 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 09:54:47,502 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:54:47,502 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_11/evaluation_runs.json
2024-05-01 09:54:47,502 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:54:47,510 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:54:47,562 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:11	Num Templates:15	Num Examples with Template:24
2024-05-01 09:54:49,008 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 09:54:49,008 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:54:49,008 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_12/evaluation_runs.json
2024-05-01 09:54:49,008 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:54:49,016 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:54:49,067 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:12	Num Templates:15	Num Examples with Template:24
2024-05-01 09:54:50,552 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 09:54:50,552 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:54:50,552 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_13/evaluation_runs.json
2024-05-01 09:54:50,552 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:54:50,560 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:54:50,611 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:13	Num Templates:15	Num Examples with Template:24
2024-05-01 09:54:52,052 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 09:54:52,052 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 09:54:52,052 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_14/evaluation_runs.json
2024-05-01 09:54:52,052 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:54:52,059 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 09:54:52,124 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:14	Num Templates:15	Num Examples with Template:24
2024-05-01 09:54:53,603 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 09:54:53,603 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 09:54:53,603 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_0/evaluation_runs.json
2024-05-01 09:54:53,603 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:54:54,536 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-3ae7a9f52719d86a/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 09:54:54,591 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 09:54:58,221 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:0	Num Templates:5	Num Examples with Template:1235
2024-05-01 09:55:07,872 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 09:55:07,872 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 09:55:07,873 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_1/evaluation_runs.json
2024-05-01 09:55:07,873 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:55:07,882 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 09:55:11,431 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:1	Num Templates:5	Num Examples with Template:1235
2024-05-01 09:55:21,144 - root - [INFO] - 	!!!Scores: {'accuracy': 0.67, 'average': 0.67}
2024-05-01 09:55:21,145 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 09:55:21,145 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_2/evaluation_runs.json
2024-05-01 09:55:21,145 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:55:21,153 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 09:55:24,763 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:2	Num Templates:5	Num Examples with Template:1235
2024-05-01 09:55:34,542 - root - [INFO] - 	!!!Scores: {'accuracy': 0.683, 'average': 0.683}
2024-05-01 09:55:34,543 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 09:55:34,543 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_3/evaluation_runs.json
2024-05-01 09:55:34,543 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:55:34,552 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 09:55:38,199 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:3	Num Templates:5	Num Examples with Template:1235
2024-05-01 09:55:48,303 - root - [INFO] - 	!!!Scores: {'accuracy': 0.679, 'average': 0.679}
2024-05-01 09:55:48,304 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 09:55:48,304 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_4/evaluation_runs.json
2024-05-01 09:55:48,304 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:55:48,313 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 09:55:51,925 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:4	Num Templates:5	Num Examples with Template:1235
2024-05-01 09:56:01,900 - root - [INFO] - 	!!!Scores: {'accuracy': 0.673, 'average': 0.673}
2024-05-01 09:56:01,900 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 09:56:01,901 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_0/evaluation_runs.json
2024-05-01 09:56:01,901 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:56:01,910 - root - [INFO] - 		Loading Full Data for wic
2024-05-01 09:56:02,825 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-8f888d9273347dcb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 09:56:02,877 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 09:56:04,375 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:0	Num Templates:10	Num Examples with Template:606
2024-05-01 09:56:09,714 - root - [INFO] - 	!!!Scores: {'accuracy': 0.658, 'average': 0.658}
2024-05-01 09:56:09,714 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 09:56:09,714 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_1/evaluation_runs.json
2024-05-01 09:56:09,714 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:56:09,723 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 09:56:11,222 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:1	Num Templates:10	Num Examples with Template:606
2024-05-01 09:56:16,416 - root - [INFO] - 	!!!Scores: {'accuracy': 0.652, 'average': 0.652}
2024-05-01 09:56:16,417 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 09:56:16,417 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_2/evaluation_runs.json
2024-05-01 09:56:16,417 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:56:16,426 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 09:56:17,931 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:2	Num Templates:10	Num Examples with Template:606
2024-05-01 09:56:23,638 - root - [INFO] - 	!!!Scores: {'accuracy': 0.652, 'average': 0.652}
2024-05-01 09:56:23,638 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 09:56:23,638 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_3/evaluation_runs.json
2024-05-01 09:56:23,638 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:56:23,647 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 09:56:25,145 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:3	Num Templates:10	Num Examples with Template:606
2024-05-01 09:56:30,976 - root - [INFO] - 	!!!Scores: {'accuracy': 0.531, 'average': 0.531}
2024-05-01 09:56:30,976 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 09:56:30,976 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_4/evaluation_runs.json
2024-05-01 09:56:30,976 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:56:30,985 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 09:56:32,469 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:4	Num Templates:10	Num Examples with Template:606
2024-05-01 09:56:37,878 - root - [INFO] - 	!!!Scores: {'accuracy': 0.528, 'average': 0.528}
2024-05-01 09:56:37,878 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 09:56:37,878 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_5/evaluation_runs.json
2024-05-01 09:56:37,878 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:56:37,887 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 09:56:39,388 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:5	Num Templates:10	Num Examples with Template:606
2024-05-01 09:56:45,226 - root - [INFO] - 	!!!Scores: {'accuracy': 0.645, 'average': 0.645}
2024-05-01 09:56:45,226 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 09:56:45,226 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_6/evaluation_runs.json
2024-05-01 09:56:45,227 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:56:45,235 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 09:56:46,731 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:6	Num Templates:10	Num Examples with Template:606
2024-05-01 09:56:52,144 - root - [INFO] - 	!!!Scores: {'accuracy': 0.665, 'average': 0.665}
2024-05-01 09:56:52,144 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 09:56:52,144 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_7/evaluation_runs.json
2024-05-01 09:56:52,145 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:56:52,153 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 09:56:53,634 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:7	Num Templates:10	Num Examples with Template:606
2024-05-01 09:56:59,200 - root - [INFO] - 	!!!Scores: {'accuracy': 0.515, 'average': 0.515}
2024-05-01 09:56:59,200 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 09:56:59,200 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_8/evaluation_runs.json
2024-05-01 09:56:59,200 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:56:59,209 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 09:57:00,710 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:8	Num Templates:10	Num Examples with Template:606
2024-05-01 09:57:06,787 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 09:57:06,787 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 09:57:06,787 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_9/evaluation_runs.json
2024-05-01 09:57:06,787 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:57:06,795 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 09:57:08,275 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:9	Num Templates:10	Num Examples with Template:606
2024-05-01 09:57:12,991 - root - [INFO] - 	!!!Scores: {'accuracy': 0.645, 'average': 0.645}
2024-05-01 09:57:12,992 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 09:57:12,992 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_0/evaluation_runs.json
2024-05-01 09:57:12,992 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:57:13,000 - root - [INFO] - 		Loading Full Data for wsc
2024-05-01 09:57:13,673 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-4bcfdb1f33f6e278/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 09:57:13,689 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 09:57:13,879 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:0	Num Templates:10	Num Examples with Template:72
2024-05-01 09:57:15,672 - root - [INFO] - 	!!!Scores: {'accuracy': 0.597, 'average': 0.597}
2024-05-01 09:57:15,672 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 09:57:15,672 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_1/evaluation_runs.json
2024-05-01 09:57:15,672 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:57:15,681 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 09:57:15,854 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:1	Num Templates:10	Num Examples with Template:72
2024-05-01 09:57:17,618 - root - [INFO] - 	!!!Scores: {'accuracy': 0.542, 'average': 0.542}
2024-05-01 09:57:17,618 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 09:57:17,618 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_2/evaluation_runs.json
2024-05-01 09:57:17,618 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:57:17,627 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 09:57:17,844 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:2	Num Templates:10	Num Examples with Template:72
2024-05-01 09:57:19,769 - root - [INFO] - 	!!!Scores: {'accuracy': 0.708, 'average': 0.708}
2024-05-01 09:57:19,769 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 09:57:19,770 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_3/evaluation_runs.json
2024-05-01 09:57:19,770 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:57:19,777 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 09:57:19,993 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:3	Num Templates:10	Num Examples with Template:72
2024-05-01 09:57:21,927 - root - [INFO] - 	!!!Scores: {'accuracy': 0.694, 'average': 0.694}
2024-05-01 09:57:21,927 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 09:57:21,927 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_4/evaluation_runs.json
2024-05-01 09:57:21,927 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:57:21,935 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 09:57:22,119 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:4	Num Templates:10	Num Examples with Template:72
2024-05-01 09:57:23,868 - root - [INFO] - 	!!!Scores: {'accuracy': 0.556, 'average': 0.556}
2024-05-01 09:57:23,868 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 09:57:23,868 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_5/evaluation_runs.json
2024-05-01 09:57:23,868 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:57:23,875 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 09:57:24,051 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:5	Num Templates:10	Num Examples with Template:72
2024-05-01 09:57:25,856 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 09:57:25,856 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 09:57:25,856 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_6/evaluation_runs.json
2024-05-01 09:57:25,856 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:57:25,865 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 09:57:26,039 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:6	Num Templates:10	Num Examples with Template:72
2024-05-01 09:57:27,849 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 09:57:27,849 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 09:57:27,849 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_7/evaluation_runs.json
2024-05-01 09:57:27,849 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:57:27,856 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 09:57:28,130 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:7	Num Templates:10	Num Examples with Template:72
2024-05-01 09:57:29,899 - root - [INFO] - 	!!!Scores: {'accuracy': 0.528, 'average': 0.528}
2024-05-01 09:57:29,899 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 09:57:29,899 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_8/evaluation_runs.json
2024-05-01 09:57:29,900 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:57:29,907 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 09:57:30,082 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:8	Num Templates:10	Num Examples with Template:72
2024-05-01 09:57:31,880 - root - [INFO] - 	!!!Scores: {'accuracy': 0.597, 'average': 0.597}
2024-05-01 09:57:31,880 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 09:57:31,881 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_9/evaluation_runs.json
2024-05-01 09:57:31,881 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:57:31,889 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 09:57:32,179 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:9	Num Templates:10	Num Examples with Template:72
2024-05-01 09:57:33,939 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 09:57:33,939 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 09:57:33,939 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_0/evaluation_runs.json
2024-05-01 09:57:33,939 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:57:33,946 - root - [INFO] - 		Loading Full Data for copa
2024-05-01 09:57:34,641 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-75ee279101665e7b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 09:57:34,656 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 09:57:34,874 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:0	Num Templates:8	Num Examples with Template:68
2024-05-01 09:57:36,464 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 09:57:36,464 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 09:57:36,464 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_1/evaluation_runs.json
2024-05-01 09:57:36,464 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:57:36,473 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 09:57:36,682 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:1	Num Templates:8	Num Examples with Template:68
2024-05-01 09:57:38,298 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 09:57:38,298 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 09:57:38,298 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_2/evaluation_runs.json
2024-05-01 09:57:38,299 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:57:38,306 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 09:57:38,517 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:2	Num Templates:8	Num Examples with Template:68
2024-05-01 09:57:40,115 - root - [INFO] - 	!!!Scores: {'accuracy': 0.926, 'average': 0.926}
2024-05-01 09:57:40,115 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 09:57:40,115 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_3/evaluation_runs.json
2024-05-01 09:57:40,116 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:57:40,124 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 09:57:40,347 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:3	Num Templates:8	Num Examples with Template:68
2024-05-01 09:57:41,876 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 09:57:41,876 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 09:57:41,876 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_4/evaluation_runs.json
2024-05-01 09:57:41,877 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:57:41,884 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 09:57:42,092 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:4	Num Templates:8	Num Examples with Template:68
2024-05-01 09:57:43,693 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 09:57:43,694 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 09:57:43,694 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_5/evaluation_runs.json
2024-05-01 09:57:43,694 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:57:43,703 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 09:57:43,913 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:5	Num Templates:8	Num Examples with Template:68
2024-05-01 09:57:45,519 - root - [INFO] - 	!!!Scores: {'accuracy': 0.868, 'average': 0.868}
2024-05-01 09:57:45,519 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 09:57:45,519 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_6/evaluation_runs.json
2024-05-01 09:57:45,519 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:57:45,527 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 09:57:45,735 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:6	Num Templates:8	Num Examples with Template:68
2024-05-01 09:57:47,296 - root - [INFO] - 	!!!Scores: {'accuracy': 0.853, 'average': 0.853}
2024-05-01 09:57:47,297 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 09:57:47,297 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_7/evaluation_runs.json
2024-05-01 09:57:47,297 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:57:47,304 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 09:57:47,514 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:7	Num Templates:8	Num Examples with Template:68
2024-05-01 09:57:49,070 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 09:57:49,070 - root - [INFO] - 	Evaluating model on h-swag dataset
2024-05-01 09:57:49,070 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/h-swag_template_0/evaluation_runs.json
2024-05-01 09:57:49,070 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 09:57:49,084 - root - [INFO] - 		Loading Full Data for h-swag
2024-05-01 09:57:49,768 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-eac19eb96b26d7b6/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 09:57:50,457 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Selected Examples: 10010	Num Total Example:10010
2024-05-01 09:58:12,682 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Num Selected Example with Templates:10010	Template Idx:0	Num Templates:1	Num Examples with Template:10010
2024-05-01 10:03:26,021 - root - [INFO] - 	!!!Scores: {'accuracy': 0.426, 'average': 0.426}
2024-05-01 10:03:26,021 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 10:03:26,021 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_0/evaluation_runs.json
2024-05-01 10:03:26,021 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:03:26,711 - datasets.builder - [WARNING] - Found cached dataset arrow (/home/guodong/.cache/huggingface/datasets/arrow/default-2796e0ea53bfdf30/0.0.0/74f69db2c14c2860059d39860b1f400a03d11bf7fb5a8258ca38c501c878c137)
2024-05-01 10:03:26,823 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 10:03:32,838 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:0	Num Templates:5	Num Examples with Template:1839
2024-05-01 10:03:57,186 - root - [INFO] - 	!!!Scores: {'accuracy': 0.922, 'average': 0.922}
2024-05-01 10:03:57,187 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 10:03:57,187 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_1/evaluation_runs.json
2024-05-01 10:03:57,187 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:03:57,196 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 10:04:03,275 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:1	Num Templates:5	Num Examples with Template:1839
2024-05-01 10:04:28,033 - root - [INFO] - 	!!!Scores: {'accuracy': 0.927, 'average': 0.927}
2024-05-01 10:04:28,034 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 10:04:28,034 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_2/evaluation_runs.json
2024-05-01 10:04:28,034 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:04:28,043 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 10:04:34,078 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:2	Num Templates:5	Num Examples with Template:1839
2024-05-01 10:04:58,670 - root - [INFO] - 	!!!Scores: {'accuracy': 0.93, 'average': 0.93}
2024-05-01 10:04:58,670 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 10:04:58,670 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_3/evaluation_runs.json
2024-05-01 10:04:58,670 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:04:58,679 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 10:05:04,754 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:3	Num Templates:5	Num Examples with Template:1839
2024-05-01 10:05:29,372 - root - [INFO] - 	!!!Scores: {'accuracy': 0.923, 'average': 0.923}
2024-05-01 10:05:29,372 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 10:05:29,372 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_4/evaluation_runs.json
2024-05-01 10:05:29,372 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:05:29,381 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 10:05:35,425 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:4	Num Templates:5	Num Examples with Template:1839
2024-05-01 10:06:00,710 - root - [INFO] - 	!!!Scores: {'accuracy': 0.921, 'average': 0.921}
2024-05-01 10:06:00,710 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:06:00,710 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_0/evaluation_runs.json
2024-05-01 10:06:00,710 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:06:01,405 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-9d580cad42f0d988/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 10:06:01,458 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:06:03,274 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:06:21,798 - root - [INFO] - 	!!!Scores: {'accuracy': 0.626, 'average': 0.626}
2024-05-01 10:06:21,799 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:06:21,799 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_1/evaluation_runs.json
2024-05-01 10:06:21,799 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:06:21,809 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:06:23,655 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:06:40,205 - root - [INFO] - 	!!!Scores: {'accuracy': 0.674, 'average': 0.674}
2024-05-01 10:06:40,205 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:06:40,206 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_2/evaluation_runs.json
2024-05-01 10:06:40,206 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:06:40,215 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:06:42,038 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:06:58,309 - root - [INFO] - 	!!!Scores: {'accuracy': 0.683, 'average': 0.683}
2024-05-01 10:06:58,309 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:06:58,309 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_3/evaluation_runs.json
2024-05-01 10:06:58,310 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:06:58,318 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:07:00,123 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:07:16,557 - root - [INFO] - 	!!!Scores: {'accuracy': 0.664, 'average': 0.664}
2024-05-01 10:07:16,557 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:07:16,557 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_4/evaluation_runs.json
2024-05-01 10:07:16,557 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:07:16,566 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:07:18,372 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:07:35,154 - root - [INFO] - 	!!!Scores: {'accuracy': 0.682, 'average': 0.682}
2024-05-01 10:07:35,154 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:07:35,154 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_5/evaluation_runs.json
2024-05-01 10:07:35,154 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:07:35,163 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:07:36,989 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:07:53,455 - root - [INFO] - 	!!!Scores: {'accuracy': 0.671, 'average': 0.671}
2024-05-01 10:07:53,455 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:07:53,455 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_6/evaluation_runs.json
2024-05-01 10:07:53,455 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:07:53,464 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:07:55,846 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:08:13,988 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 10:08:13,988 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:08:13,988 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_7/evaluation_runs.json
2024-05-01 10:08:13,989 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:08:13,998 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:08:15,842 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:08:32,553 - root - [INFO] - 	!!!Scores: {'accuracy': 0.665, 'average': 0.665}
2024-05-01 10:08:32,553 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:08:32,553 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_8/evaluation_runs.json
2024-05-01 10:08:32,553 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:08:32,562 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:08:34,415 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:08:50,953 - root - [INFO] - 	!!!Scores: {'accuracy': 0.671, 'average': 0.671}
2024-05-01 10:08:50,953 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:08:50,953 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_9/evaluation_runs.json
2024-05-01 10:08:50,953 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:08:50,962 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:08:53,349 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:09:10,516 - root - [INFO] - 	!!!Scores: {'accuracy': 0.623, 'average': 0.623}
2024-05-01 10:09:10,517 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:09:10,517 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_10/evaluation_runs.json
2024-05-01 10:09:10,517 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:09:10,525 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:09:12,895 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:09:29,744 - root - [INFO] - 	!!!Scores: {'accuracy': 0.652, 'average': 0.652}
2024-05-01 10:09:29,744 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:09:29,744 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_11/evaluation_runs.json
2024-05-01 10:09:29,744 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:09:29,753 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:09:31,552 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:09:48,081 - root - [INFO] - 	!!!Scores: {'accuracy': 0.675, 'average': 0.675}
2024-05-01 10:09:48,081 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:09:48,082 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_12/evaluation_runs.json
2024-05-01 10:09:48,082 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:09:48,090 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:09:50,446 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:10:07,316 - root - [INFO] - 	!!!Scores: {'accuracy': 0.629, 'average': 0.629}
2024-05-01 10:10:07,316 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:10:07,316 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_13/evaluation_runs.json
2024-05-01 10:10:07,316 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:10:07,324 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:10:09,672 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:10:27,418 - root - [INFO] - 	!!!Scores: {'accuracy': 0.662, 'average': 0.662}
2024-05-01 10:10:27,418 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:10:27,418 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_14/evaluation_runs.json
2024-05-01 10:10:27,418 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:10:27,427 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:10:29,269 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:10:45,794 - root - [INFO] - 	!!!Scores: {'accuracy': 0.671, 'average': 0.671}
2024-05-01 10:10:45,795 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:10:45,795 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_0/evaluation_runs.json
2024-05-01 10:10:45,795 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:10:46,507 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-2f5e309763d7971f/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 10:10:46,560 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:10:48,377 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:11:06,338 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 10:11:06,339 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:11:06,339 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_1/evaluation_runs.json
2024-05-01 10:11:06,339 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:11:06,348 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:11:08,204 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:11:24,465 - root - [INFO] - 	!!!Scores: {'accuracy': 0.521, 'average': 0.521}
2024-05-01 10:11:24,465 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:11:24,465 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_2/evaluation_runs.json
2024-05-01 10:11:24,465 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:11:24,474 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:11:26,311 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:11:42,175 - root - [INFO] - 	!!!Scores: {'accuracy': 0.522, 'average': 0.522}
2024-05-01 10:11:42,175 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:11:42,175 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_3/evaluation_runs.json
2024-05-01 10:11:42,175 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:11:42,184 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:11:43,998 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:12:00,042 - root - [INFO] - 	!!!Scores: {'accuracy': 0.495, 'average': 0.495}
2024-05-01 10:12:00,042 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:12:00,042 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_4/evaluation_runs.json
2024-05-01 10:12:00,042 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:12:00,052 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:12:01,863 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:12:18,320 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 10:12:18,320 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:12:18,320 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_5/evaluation_runs.json
2024-05-01 10:12:18,320 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:12:18,329 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:12:20,357 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:12:36,465 - root - [INFO] - 	!!!Scores: {'accuracy': 0.515, 'average': 0.515}
2024-05-01 10:12:36,465 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:12:36,465 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_6/evaluation_runs.json
2024-05-01 10:12:36,465 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:12:36,474 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:12:38,873 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:12:56,517 - root - [INFO] - 	!!!Scores: {'accuracy': 0.503, 'average': 0.503}
2024-05-01 10:12:56,518 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:12:56,518 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_7/evaluation_runs.json
2024-05-01 10:12:56,518 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:12:56,527 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:12:58,388 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:13:14,786 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 10:13:14,786 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:13:14,786 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_8/evaluation_runs.json
2024-05-01 10:13:14,786 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:13:14,796 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:13:16,652 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:13:32,884 - root - [INFO] - 	!!!Scores: {'accuracy': 0.514, 'average': 0.514}
2024-05-01 10:13:32,884 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:13:32,884 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_9/evaluation_runs.json
2024-05-01 10:13:32,884 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:13:32,892 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:13:35,289 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:13:52,142 - root - [INFO] - 	!!!Scores: {'accuracy': 0.493, 'average': 0.493}
2024-05-01 10:13:52,142 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:13:52,142 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_10/evaluation_runs.json
2024-05-01 10:13:52,142 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:13:52,151 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:13:54,540 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:14:11,099 - root - [INFO] - 	!!!Scores: {'accuracy': 0.493, 'average': 0.493}
2024-05-01 10:14:11,099 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:14:11,099 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_11/evaluation_runs.json
2024-05-01 10:14:11,099 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:14:11,107 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:14:12,929 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:14:29,170 - root - [INFO] - 	!!!Scores: {'accuracy': 0.518, 'average': 0.518}
2024-05-01 10:14:29,170 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:14:29,170 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_12/evaluation_runs.json
2024-05-01 10:14:29,170 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:14:29,179 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:14:31,552 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:14:48,083 - root - [INFO] - 	!!!Scores: {'accuracy': 0.495, 'average': 0.495}
2024-05-01 10:14:48,083 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:14:48,083 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_13/evaluation_runs.json
2024-05-01 10:14:48,083 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:14:48,091 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:14:50,454 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:15:07,823 - root - [INFO] - 	!!!Scores: {'accuracy': 0.504, 'average': 0.504}
2024-05-01 10:15:07,823 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:15:07,823 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_14/evaluation_runs.json
2024-05-01 10:15:07,823 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:15:07,832 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:15:09,691 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:15:25,927 - root - [INFO] - 	!!!Scores: {'accuracy': 0.51, 'average': 0.51}
2024-05-01 10:15:25,927 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:15:25,927 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_0/evaluation_runs.json
2024-05-01 10:15:25,927 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:15:26,840 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-5854c73e8e2a58bf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 10:15:26,900 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:15:29,068 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:0	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:15:54,417 - root - [INFO] - 	!!!Scores: {'accuracy': 0.498, 'average': 0.498}
2024-05-01 10:15:54,418 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:15:54,418 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_1/evaluation_runs.json
2024-05-01 10:15:54,418 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:15:54,427 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:15:56,659 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:1	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:16:19,540 - root - [INFO] - 	!!!Scores: {'accuracy': 0.503, 'average': 0.503}
2024-05-01 10:16:19,540 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:16:19,540 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_2/evaluation_runs.json
2024-05-01 10:16:19,540 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:16:19,549 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:16:21,741 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:2	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:16:44,230 - root - [INFO] - 	!!!Scores: {'accuracy': 0.487, 'average': 0.487}
2024-05-01 10:16:44,230 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:16:44,230 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_3/evaluation_runs.json
2024-05-01 10:16:44,230 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:16:44,240 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:16:46,414 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:3	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:17:09,102 - root - [INFO] - 	!!!Scores: {'accuracy': 0.508, 'average': 0.508}
2024-05-01 10:17:09,102 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:17:09,102 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_4/evaluation_runs.json
2024-05-01 10:17:09,102 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:17:09,111 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:17:11,283 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:4	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:17:34,531 - root - [INFO] - 	!!!Scores: {'accuracy': 0.498, 'average': 0.498}
2024-05-01 10:17:34,532 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:17:34,532 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_5/evaluation_runs.json
2024-05-01 10:17:34,532 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:17:34,541 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:17:36,735 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:5	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:17:59,487 - root - [INFO] - 	!!!Scores: {'accuracy': 0.503, 'average': 0.503}
2024-05-01 10:17:59,487 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:17:59,487 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_6/evaluation_runs.json
2024-05-01 10:17:59,487 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:17:59,496 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:18:02,364 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:6	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:18:27,243 - root - [INFO] - 	!!!Scores: {'accuracy': 0.495, 'average': 0.495}
2024-05-01 10:18:27,243 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:18:27,243 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_7/evaluation_runs.json
2024-05-01 10:18:27,243 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:18:27,253 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:18:29,478 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:7	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:18:52,619 - root - [INFO] - 	!!!Scores: {'accuracy': 0.489, 'average': 0.489}
2024-05-01 10:18:52,620 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:18:52,620 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_8/evaluation_runs.json
2024-05-01 10:18:52,620 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:18:52,629 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:18:54,861 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:8	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:19:17,703 - root - [INFO] - 	!!!Scores: {'accuracy': 0.492, 'average': 0.492}
2024-05-01 10:19:17,703 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:19:17,703 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_9/evaluation_runs.json
2024-05-01 10:19:17,703 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:19:17,711 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:19:20,578 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:9	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:19:44,312 - root - [INFO] - 	!!!Scores: {'accuracy': 0.489, 'average': 0.489}
2024-05-01 10:19:44,312 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:19:44,312 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_10/evaluation_runs.json
2024-05-01 10:19:44,312 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:19:44,321 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:19:47,177 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:10	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:20:10,543 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 10:20:10,543 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:20:10,543 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_11/evaluation_runs.json
2024-05-01 10:20:10,543 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:20:10,551 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:20:12,728 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:11	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:20:35,581 - root - [INFO] - 	!!!Scores: {'accuracy': 0.491, 'average': 0.491}
2024-05-01 10:20:35,581 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:20:35,581 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_12/evaluation_runs.json
2024-05-01 10:20:35,582 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:20:35,591 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:20:38,429 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:12	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:21:01,796 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 10:21:01,796 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:21:01,796 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_13/evaluation_runs.json
2024-05-01 10:21:01,796 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:21:01,804 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:21:04,628 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:13	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:21:29,021 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 10:21:29,021 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:21:29,021 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_14/evaluation_runs.json
2024-05-01 10:21:29,021 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:21:29,029 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:21:31,245 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:14	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:21:54,101 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 10:21:54,176 - root - [INFO] - Unexpected keys: []
2024-05-01 10:21:54,407 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 10:21:54,407 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_0/evaluation_runs.json
2024-05-01 10:21:54,407 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:21:54,416 - root - [INFO] - 		Loading Full Data for rte
2024-05-01 10:21:55,101 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-667c257dd17a5fbb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 10:21:55,124 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 10:21:55,642 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:0	Num Templates:10	Num Examples with Template:245
2024-05-01 10:22:01,411 - root - [INFO] - 	!!!Scores: {'accuracy': 0.816, 'average': 0.816}
2024-05-01 10:22:01,411 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 10:22:01,411 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_1/evaluation_runs.json
2024-05-01 10:22:01,411 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:22:01,420 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 10:22:01,941 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:1	Num Templates:10	Num Examples with Template:245
2024-05-01 10:22:07,397 - root - [INFO] - 	!!!Scores: {'accuracy': 0.788, 'average': 0.788}
2024-05-01 10:22:07,397 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 10:22:07,397 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_2/evaluation_runs.json
2024-05-01 10:22:07,398 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:22:07,406 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 10:22:07,926 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:2	Num Templates:10	Num Examples with Template:245
2024-05-01 10:22:13,382 - root - [INFO] - 	!!!Scores: {'accuracy': 0.816, 'average': 0.816}
2024-05-01 10:22:13,382 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 10:22:13,382 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_3/evaluation_runs.json
2024-05-01 10:22:13,382 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:22:13,391 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 10:22:13,906 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:3	Num Templates:10	Num Examples with Template:245
2024-05-01 10:22:19,288 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 10:22:19,288 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 10:22:19,288 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_4/evaluation_runs.json
2024-05-01 10:22:19,288 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:22:19,297 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 10:22:19,813 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:4	Num Templates:10	Num Examples with Template:245
2024-05-01 10:22:25,266 - root - [INFO] - 	!!!Scores: {'accuracy': 0.804, 'average': 0.804}
2024-05-01 10:22:25,267 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 10:22:25,267 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_5/evaluation_runs.json
2024-05-01 10:22:25,267 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:22:25,275 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 10:22:25,795 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:5	Num Templates:10	Num Examples with Template:245
2024-05-01 10:22:31,249 - root - [INFO] - 	!!!Scores: {'accuracy': 0.8, 'average': 0.8}
2024-05-01 10:22:31,250 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 10:22:31,250 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_6/evaluation_runs.json
2024-05-01 10:22:31,250 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:22:31,259 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 10:22:31,779 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:6	Num Templates:10	Num Examples with Template:245
2024-05-01 10:22:37,163 - root - [INFO] - 	!!!Scores: {'accuracy': 0.796, 'average': 0.796}
2024-05-01 10:22:37,163 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 10:22:37,163 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_7/evaluation_runs.json
2024-05-01 10:22:37,163 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:22:37,172 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 10:22:37,687 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:7	Num Templates:10	Num Examples with Template:245
2024-05-01 10:22:43,271 - root - [INFO] - 	!!!Scores: {'accuracy': 0.78, 'average': 0.78}
2024-05-01 10:22:43,271 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 10:22:43,271 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_8/evaluation_runs.json
2024-05-01 10:22:43,271 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:22:43,279 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 10:22:43,794 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:8	Num Templates:10	Num Examples with Template:245
2024-05-01 10:22:49,241 - root - [INFO] - 	!!!Scores: {'accuracy': 0.8, 'average': 0.8}
2024-05-01 10:22:49,241 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 10:22:49,241 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_9/evaluation_runs.json
2024-05-01 10:22:49,242 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:22:49,250 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 10:22:49,771 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:9	Num Templates:10	Num Examples with Template:245
2024-05-01 10:22:55,310 - root - [INFO] - 	!!!Scores: {'accuracy': 0.8, 'average': 0.8}
2024-05-01 10:22:55,310 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:22:55,310 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_0/evaluation_runs.json
2024-05-01 10:22:55,310 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:22:55,318 - root - [INFO] - 		Loading Full Data for cb
2024-05-01 10:22:55,993 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-c26e73467e3f215e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 10:22:56,002 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:22:56,066 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:0	Num Templates:15	Num Examples with Template:24
2024-05-01 10:22:57,518 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 10:22:57,518 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:22:57,518 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_1/evaluation_runs.json
2024-05-01 10:22:57,518 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:22:57,527 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:22:57,579 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:1	Num Templates:15	Num Examples with Template:24
2024-05-01 10:22:59,027 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 10:22:59,027 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:22:59,027 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_2/evaluation_runs.json
2024-05-01 10:22:59,027 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:22:59,035 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:22:59,099 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:2	Num Templates:15	Num Examples with Template:24
2024-05-01 10:23:00,565 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 10:23:00,566 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:23:00,566 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_3/evaluation_runs.json
2024-05-01 10:23:00,566 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:23:00,573 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:23:00,625 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:3	Num Templates:15	Num Examples with Template:24
2024-05-01 10:23:02,064 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 10:23:02,064 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:23:02,065 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_4/evaluation_runs.json
2024-05-01 10:23:02,065 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:23:02,073 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:23:02,125 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:4	Num Templates:15	Num Examples with Template:24
2024-05-01 10:23:03,567 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 10:23:03,567 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:23:03,567 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_5/evaluation_runs.json
2024-05-01 10:23:03,567 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:23:03,574 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:23:03,639 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:5	Num Templates:15	Num Examples with Template:24
2024-05-01 10:23:05,089 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 10:23:05,089 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:23:05,089 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_6/evaluation_runs.json
2024-05-01 10:23:05,089 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:23:05,097 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:23:05,148 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:6	Num Templates:15	Num Examples with Template:24
2024-05-01 10:23:06,587 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 10:23:06,587 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:23:06,587 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_7/evaluation_runs.json
2024-05-01 10:23:06,587 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:23:06,596 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:23:06,661 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:7	Num Templates:15	Num Examples with Template:24
2024-05-01 10:23:08,115 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 10:23:08,115 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:23:08,115 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_8/evaluation_runs.json
2024-05-01 10:23:08,115 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:23:08,123 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:23:08,174 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:8	Num Templates:15	Num Examples with Template:24
2024-05-01 10:23:09,618 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 10:23:09,618 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:23:09,618 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_9/evaluation_runs.json
2024-05-01 10:23:09,618 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:23:09,625 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:23:09,677 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:9	Num Templates:15	Num Examples with Template:24
2024-05-01 10:23:11,122 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 10:23:11,122 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:23:11,122 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_10/evaluation_runs.json
2024-05-01 10:23:11,122 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:23:11,129 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:23:11,194 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:10	Num Templates:15	Num Examples with Template:24
2024-05-01 10:23:12,650 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 10:23:12,650 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:23:12,650 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_11/evaluation_runs.json
2024-05-01 10:23:12,650 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:23:12,659 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:23:12,711 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:11	Num Templates:15	Num Examples with Template:24
2024-05-01 10:23:14,156 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 10:23:14,156 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:23:14,156 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_12/evaluation_runs.json
2024-05-01 10:23:14,157 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:23:14,164 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:23:14,215 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:12	Num Templates:15	Num Examples with Template:24
2024-05-01 10:23:15,701 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 10:23:15,701 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:23:15,701 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_13/evaluation_runs.json
2024-05-01 10:23:15,701 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:23:15,708 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:23:15,760 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:13	Num Templates:15	Num Examples with Template:24
2024-05-01 10:23:17,201 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 10:23:17,201 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:23:17,201 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_14/evaluation_runs.json
2024-05-01 10:23:17,202 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:23:17,209 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:23:17,274 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:14	Num Templates:15	Num Examples with Template:24
2024-05-01 10:23:18,751 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 10:23:18,751 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 10:23:18,751 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_0/evaluation_runs.json
2024-05-01 10:23:18,751 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:23:19,441 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-3ae7a9f52719d86a/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 10:23:19,496 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 10:23:23,127 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:0	Num Templates:5	Num Examples with Template:1235
2024-05-01 10:23:32,767 - root - [INFO] - 	!!!Scores: {'accuracy': 0.678, 'average': 0.678}
2024-05-01 10:23:32,767 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 10:23:32,767 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_1/evaluation_runs.json
2024-05-01 10:23:32,767 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:23:32,776 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 10:23:36,326 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:1	Num Templates:5	Num Examples with Template:1235
2024-05-01 10:23:46,038 - root - [INFO] - 	!!!Scores: {'accuracy': 0.669, 'average': 0.669}
2024-05-01 10:23:46,038 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 10:23:46,038 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_2/evaluation_runs.json
2024-05-01 10:23:46,039 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:23:46,047 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 10:23:49,662 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:2	Num Templates:5	Num Examples with Template:1235
2024-05-01 10:23:59,443 - root - [INFO] - 	!!!Scores: {'accuracy': 0.679, 'average': 0.679}
2024-05-01 10:23:59,443 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 10:23:59,443 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_3/evaluation_runs.json
2024-05-01 10:23:59,443 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:23:59,452 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 10:24:03,101 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:3	Num Templates:5	Num Examples with Template:1235
2024-05-01 10:24:13,199 - root - [INFO] - 	!!!Scores: {'accuracy': 0.673, 'average': 0.673}
2024-05-01 10:24:13,199 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 10:24:13,199 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_4/evaluation_runs.json
2024-05-01 10:24:13,200 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:24:13,208 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 10:24:16,817 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:4	Num Templates:5	Num Examples with Template:1235
2024-05-01 10:24:26,800 - root - [INFO] - 	!!!Scores: {'accuracy': 0.675, 'average': 0.675}
2024-05-01 10:24:26,800 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 10:24:26,800 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_0/evaluation_runs.json
2024-05-01 10:24:26,801 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:24:26,809 - root - [INFO] - 		Loading Full Data for wic
2024-05-01 10:24:27,494 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-8f888d9273347dcb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 10:24:27,547 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 10:24:29,047 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:0	Num Templates:10	Num Examples with Template:606
2024-05-01 10:24:34,382 - root - [INFO] - 	!!!Scores: {'accuracy': 0.663, 'average': 0.663}
2024-05-01 10:24:34,382 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 10:24:34,383 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_1/evaluation_runs.json
2024-05-01 10:24:34,383 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:24:34,391 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 10:24:35,891 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:1	Num Templates:10	Num Examples with Template:606
2024-05-01 10:24:41,087 - root - [INFO] - 	!!!Scores: {'accuracy': 0.652, 'average': 0.652}
2024-05-01 10:24:41,087 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 10:24:41,087 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_2/evaluation_runs.json
2024-05-01 10:24:41,087 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:24:41,096 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 10:24:42,600 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:2	Num Templates:10	Num Examples with Template:606
2024-05-01 10:24:48,307 - root - [INFO] - 	!!!Scores: {'accuracy': 0.642, 'average': 0.642}
2024-05-01 10:24:48,307 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 10:24:48,307 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_3/evaluation_runs.json
2024-05-01 10:24:48,308 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:24:48,316 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 10:24:49,820 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:3	Num Templates:10	Num Examples with Template:606
2024-05-01 10:24:55,645 - root - [INFO] - 	!!!Scores: {'accuracy': 0.526, 'average': 0.526}
2024-05-01 10:24:55,645 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 10:24:55,645 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_4/evaluation_runs.json
2024-05-01 10:24:55,645 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:24:55,654 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 10:24:57,141 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:4	Num Templates:10	Num Examples with Template:606
2024-05-01 10:25:02,552 - root - [INFO] - 	!!!Scores: {'accuracy': 0.536, 'average': 0.536}
2024-05-01 10:25:02,552 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 10:25:02,552 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_5/evaluation_runs.json
2024-05-01 10:25:02,553 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:25:02,561 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 10:25:04,061 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:5	Num Templates:10	Num Examples with Template:606
2024-05-01 10:25:09,901 - root - [INFO] - 	!!!Scores: {'accuracy': 0.629, 'average': 0.629}
2024-05-01 10:25:09,901 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 10:25:09,901 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_6/evaluation_runs.json
2024-05-01 10:25:09,901 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:25:09,910 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 10:25:11,405 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:6	Num Templates:10	Num Examples with Template:606
2024-05-01 10:25:16,821 - root - [INFO] - 	!!!Scores: {'accuracy': 0.657, 'average': 0.657}
2024-05-01 10:25:16,821 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 10:25:16,821 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_7/evaluation_runs.json
2024-05-01 10:25:16,821 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:25:16,830 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 10:25:18,315 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:7	Num Templates:10	Num Examples with Template:606
2024-05-01 10:25:23,874 - root - [INFO] - 	!!!Scores: {'accuracy': 0.517, 'average': 0.517}
2024-05-01 10:25:23,874 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 10:25:23,874 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_8/evaluation_runs.json
2024-05-01 10:25:23,874 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:25:23,881 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 10:25:25,515 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:8	Num Templates:10	Num Examples with Template:606
2024-05-01 10:25:31,596 - root - [INFO] - 	!!!Scores: {'accuracy': 0.634, 'average': 0.634}
2024-05-01 10:25:31,596 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 10:25:31,596 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_9/evaluation_runs.json
2024-05-01 10:25:31,596 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:25:31,604 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 10:25:33,100 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:9	Num Templates:10	Num Examples with Template:606
2024-05-01 10:25:37,821 - root - [INFO] - 	!!!Scores: {'accuracy': 0.65, 'average': 0.65}
2024-05-01 10:25:37,821 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 10:25:37,821 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_0/evaluation_runs.json
2024-05-01 10:25:37,821 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:25:37,829 - root - [INFO] - 		Loading Full Data for wsc
2024-05-01 10:25:38,574 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-4bcfdb1f33f6e278/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 10:25:38,589 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 10:25:38,780 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:0	Num Templates:10	Num Examples with Template:72
2024-05-01 10:25:40,572 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 10:25:40,572 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 10:25:40,572 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_1/evaluation_runs.json
2024-05-01 10:25:40,572 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:25:40,581 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 10:25:40,755 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:1	Num Templates:10	Num Examples with Template:72
2024-05-01 10:25:42,518 - root - [INFO] - 	!!!Scores: {'accuracy': 0.528, 'average': 0.528}
2024-05-01 10:25:42,518 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 10:25:42,518 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_2/evaluation_runs.json
2024-05-01 10:25:42,519 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:25:42,527 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 10:25:42,746 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:2	Num Templates:10	Num Examples with Template:72
2024-05-01 10:25:44,670 - root - [INFO] - 	!!!Scores: {'accuracy': 0.708, 'average': 0.708}
2024-05-01 10:25:44,670 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 10:25:44,670 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_3/evaluation_runs.json
2024-05-01 10:25:44,670 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:25:44,678 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 10:25:44,896 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:3	Num Templates:10	Num Examples with Template:72
2024-05-01 10:25:46,828 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 10:25:46,829 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 10:25:46,829 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_4/evaluation_runs.json
2024-05-01 10:25:46,829 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:25:46,837 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 10:25:47,021 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:4	Num Templates:10	Num Examples with Template:72
2024-05-01 10:25:48,770 - root - [INFO] - 	!!!Scores: {'accuracy': 0.583, 'average': 0.583}
2024-05-01 10:25:48,771 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 10:25:48,771 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_5/evaluation_runs.json
2024-05-01 10:25:48,771 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:25:48,778 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 10:25:48,956 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:5	Num Templates:10	Num Examples with Template:72
2024-05-01 10:25:50,761 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 10:25:50,761 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 10:25:50,761 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_6/evaluation_runs.json
2024-05-01 10:25:50,761 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:25:50,769 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 10:25:50,943 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:6	Num Templates:10	Num Examples with Template:72
2024-05-01 10:25:52,750 - root - [INFO] - 	!!!Scores: {'accuracy': 0.611, 'average': 0.611}
2024-05-01 10:25:52,751 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 10:25:52,751 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_7/evaluation_runs.json
2024-05-01 10:25:52,751 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:25:52,758 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 10:25:53,032 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:7	Num Templates:10	Num Examples with Template:72
2024-05-01 10:25:54,802 - root - [INFO] - 	!!!Scores: {'accuracy': 0.528, 'average': 0.528}
2024-05-01 10:25:54,802 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 10:25:54,802 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_8/evaluation_runs.json
2024-05-01 10:25:54,803 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:25:54,811 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 10:25:54,986 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:8	Num Templates:10	Num Examples with Template:72
2024-05-01 10:25:56,785 - root - [INFO] - 	!!!Scores: {'accuracy': 0.583, 'average': 0.583}
2024-05-01 10:25:56,785 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 10:25:56,785 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_9/evaluation_runs.json
2024-05-01 10:25:56,785 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:25:56,792 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 10:25:57,083 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:9	Num Templates:10	Num Examples with Template:72
2024-05-01 10:25:58,844 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 10:25:58,844 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 10:25:58,844 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_0/evaluation_runs.json
2024-05-01 10:25:58,844 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:25:58,851 - root - [INFO] - 		Loading Full Data for copa
2024-05-01 10:25:59,532 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-75ee279101665e7b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 10:25:59,548 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 10:25:59,767 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:0	Num Templates:8	Num Examples with Template:68
2024-05-01 10:26:01,359 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 10:26:01,359 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 10:26:01,359 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_1/evaluation_runs.json
2024-05-01 10:26:01,359 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:26:01,367 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 10:26:01,576 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:1	Num Templates:8	Num Examples with Template:68
2024-05-01 10:26:03,194 - root - [INFO] - 	!!!Scores: {'accuracy': 0.824, 'average': 0.824}
2024-05-01 10:26:03,194 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 10:26:03,194 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_2/evaluation_runs.json
2024-05-01 10:26:03,194 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:26:03,201 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 10:26:03,412 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:2	Num Templates:8	Num Examples with Template:68
2024-05-01 10:26:05,009 - root - [INFO] - 	!!!Scores: {'accuracy': 0.926, 'average': 0.926}
2024-05-01 10:26:05,009 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 10:26:05,009 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_3/evaluation_runs.json
2024-05-01 10:26:05,009 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:26:05,018 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 10:26:05,241 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:3	Num Templates:8	Num Examples with Template:68
2024-05-01 10:26:06,771 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 10:26:06,771 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 10:26:06,771 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_4/evaluation_runs.json
2024-05-01 10:26:06,771 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:26:06,779 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 10:26:06,988 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:4	Num Templates:8	Num Examples with Template:68
2024-05-01 10:26:08,590 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 10:26:08,590 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 10:26:08,591 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_5/evaluation_runs.json
2024-05-01 10:26:08,591 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:26:08,599 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 10:26:08,810 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:5	Num Templates:8	Num Examples with Template:68
2024-05-01 10:26:10,415 - root - [INFO] - 	!!!Scores: {'accuracy': 0.868, 'average': 0.868}
2024-05-01 10:26:10,416 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 10:26:10,416 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_6/evaluation_runs.json
2024-05-01 10:26:10,416 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:26:10,423 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 10:26:10,631 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:6	Num Templates:8	Num Examples with Template:68
2024-05-01 10:26:12,191 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 10:26:12,191 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 10:26:12,191 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_7/evaluation_runs.json
2024-05-01 10:26:12,192 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:26:12,199 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 10:26:12,409 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:7	Num Templates:8	Num Examples with Template:68
2024-05-01 10:26:13,965 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 10:26:13,965 - root - [INFO] - 	Evaluating model on h-swag dataset
2024-05-01 10:26:13,965 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/h-swag_template_0/evaluation_runs.json
2024-05-01 10:26:13,965 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:26:13,979 - root - [INFO] - 		Loading Full Data for h-swag
2024-05-01 10:26:14,671 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-eac19eb96b26d7b6/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 10:26:15,358 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Selected Examples: 10010	Num Total Example:10010
2024-05-01 10:26:37,288 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Num Selected Example with Templates:10010	Template Idx:0	Num Templates:1	Num Examples with Template:10010
2024-05-01 10:31:50,677 - root - [INFO] - 	!!!Scores: {'accuracy': 0.427, 'average': 0.427}
2024-05-01 10:31:50,677 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 10:31:50,677 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_0/evaluation_runs.json
2024-05-01 10:31:50,677 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:31:51,362 - datasets.builder - [WARNING] - Found cached dataset arrow (/home/guodong/.cache/huggingface/datasets/arrow/default-2796e0ea53bfdf30/0.0.0/74f69db2c14c2860059d39860b1f400a03d11bf7fb5a8258ca38c501c878c137)
2024-05-01 10:31:51,472 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 10:31:57,494 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:0	Num Templates:5	Num Examples with Template:1839
2024-05-01 10:32:22,014 - root - [INFO] - 	!!!Scores: {'accuracy': 0.924, 'average': 0.924}
2024-05-01 10:32:22,014 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 10:32:22,015 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_1/evaluation_runs.json
2024-05-01 10:32:22,015 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:32:22,024 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 10:32:28,086 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:1	Num Templates:5	Num Examples with Template:1839
2024-05-01 10:32:53,011 - root - [INFO] - 	!!!Scores: {'accuracy': 0.925, 'average': 0.925}
2024-05-01 10:32:53,012 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 10:32:53,012 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_2/evaluation_runs.json
2024-05-01 10:32:53,012 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:32:53,021 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 10:32:59,016 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:2	Num Templates:5	Num Examples with Template:1839
2024-05-01 10:33:23,795 - root - [INFO] - 	!!!Scores: {'accuracy': 0.93, 'average': 0.93}
2024-05-01 10:33:23,795 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 10:33:23,795 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_3/evaluation_runs.json
2024-05-01 10:33:23,795 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:33:23,805 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 10:33:29,858 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:3	Num Templates:5	Num Examples with Template:1839
2024-05-01 10:33:54,655 - root - [INFO] - 	!!!Scores: {'accuracy': 0.925, 'average': 0.925}
2024-05-01 10:33:54,656 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 10:33:54,656 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_4/evaluation_runs.json
2024-05-01 10:33:54,656 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:33:54,665 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 10:34:00,882 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:4	Num Templates:5	Num Examples with Template:1839
2024-05-01 10:34:26,350 - root - [INFO] - 	!!!Scores: {'accuracy': 0.922, 'average': 0.922}
2024-05-01 10:34:26,351 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:34:26,351 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_0/evaluation_runs.json
2024-05-01 10:34:26,351 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:34:27,068 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-9d580cad42f0d988/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 10:34:27,118 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:34:28,935 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:34:47,540 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 10:34:47,540 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:34:47,540 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_1/evaluation_runs.json
2024-05-01 10:34:47,540 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:34:47,555 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:34:49,412 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:35:06,007 - root - [INFO] - 	!!!Scores: {'accuracy': 0.678, 'average': 0.678}
2024-05-01 10:35:06,007 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:35:06,007 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_2/evaluation_runs.json
2024-05-01 10:35:06,008 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:35:06,017 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:35:07,834 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:35:24,120 - root - [INFO] - 	!!!Scores: {'accuracy': 0.685, 'average': 0.685}
2024-05-01 10:35:24,120 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:35:24,120 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_3/evaluation_runs.json
2024-05-01 10:35:24,120 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:35:24,129 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:35:25,932 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:35:42,376 - root - [INFO] - 	!!!Scores: {'accuracy': 0.665, 'average': 0.665}
2024-05-01 10:35:42,376 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:35:42,376 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_4/evaluation_runs.json
2024-05-01 10:35:42,376 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:35:42,385 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:35:44,186 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:36:00,981 - root - [INFO] - 	!!!Scores: {'accuracy': 0.683, 'average': 0.683}
2024-05-01 10:36:00,982 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:36:00,982 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_5/evaluation_runs.json
2024-05-01 10:36:00,982 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:36:00,991 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:36:02,814 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:36:19,294 - root - [INFO] - 	!!!Scores: {'accuracy': 0.671, 'average': 0.671}
2024-05-01 10:36:19,294 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:36:19,294 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_6/evaluation_runs.json
2024-05-01 10:36:19,294 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:36:19,303 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:36:21,675 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:36:39,818 - root - [INFO] - 	!!!Scores: {'accuracy': 0.658, 'average': 0.658}
2024-05-01 10:36:39,818 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:36:39,818 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_7/evaluation_runs.json
2024-05-01 10:36:39,818 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:36:39,827 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:36:41,668 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:36:58,378 - root - [INFO] - 	!!!Scores: {'accuracy': 0.669, 'average': 0.669}
2024-05-01 10:36:58,378 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:36:58,378 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_8/evaluation_runs.json
2024-05-01 10:36:58,379 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:36:58,388 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:37:00,224 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:37:16,752 - root - [INFO] - 	!!!Scores: {'accuracy': 0.669, 'average': 0.669}
2024-05-01 10:37:16,752 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:37:16,752 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_9/evaluation_runs.json
2024-05-01 10:37:16,752 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:37:16,760 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:37:19,131 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:37:36,298 - root - [INFO] - 	!!!Scores: {'accuracy': 0.615, 'average': 0.615}
2024-05-01 10:37:36,299 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:37:36,299 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_10/evaluation_runs.json
2024-05-01 10:37:36,299 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:37:36,308 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:37:38,670 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:37:55,527 - root - [INFO] - 	!!!Scores: {'accuracy': 0.655, 'average': 0.655}
2024-05-01 10:37:55,527 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:37:55,527 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_11/evaluation_runs.json
2024-05-01 10:37:55,527 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:37:55,536 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:37:57,335 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:38:13,867 - root - [INFO] - 	!!!Scores: {'accuracy': 0.675, 'average': 0.675}
2024-05-01 10:38:13,867 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:38:13,867 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_12/evaluation_runs.json
2024-05-01 10:38:13,867 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:38:13,875 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:38:16,226 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:38:33,102 - root - [INFO] - 	!!!Scores: {'accuracy': 0.628, 'average': 0.628}
2024-05-01 10:38:33,103 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:38:33,103 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_13/evaluation_runs.json
2024-05-01 10:38:33,103 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:38:33,110 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:38:35,455 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:38:53,206 - root - [INFO] - 	!!!Scores: {'accuracy': 0.665, 'average': 0.665}
2024-05-01 10:38:53,206 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 10:38:53,206 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_14/evaluation_runs.json
2024-05-01 10:38:53,206 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:38:53,215 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:38:55,050 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:39:11,584 - root - [INFO] - 	!!!Scores: {'accuracy': 0.671, 'average': 0.671}
2024-05-01 10:39:11,585 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:39:11,585 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_0/evaluation_runs.json
2024-05-01 10:39:11,585 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:39:22,304 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-2f5e309763d7971f/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 10:39:22,358 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:39:24,155 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:39:42,045 - root - [INFO] - 	!!!Scores: {'accuracy': 0.496, 'average': 0.496}
2024-05-01 10:39:42,045 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:39:42,045 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_1/evaluation_runs.json
2024-05-01 10:39:42,045 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:39:42,053 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:39:43,891 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:40:00,150 - root - [INFO] - 	!!!Scores: {'accuracy': 0.518, 'average': 0.518}
2024-05-01 10:40:00,150 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:40:00,150 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_2/evaluation_runs.json
2024-05-01 10:40:00,150 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:40:00,159 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:40:01,973 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:40:17,836 - root - [INFO] - 	!!!Scores: {'accuracy': 0.527, 'average': 0.527}
2024-05-01 10:40:17,837 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:40:17,837 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_3/evaluation_runs.json
2024-05-01 10:40:17,837 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:40:17,845 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:40:19,643 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:40:35,732 - root - [INFO] - 	!!!Scores: {'accuracy': 0.496, 'average': 0.496}
2024-05-01 10:40:35,732 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:40:35,732 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_4/evaluation_runs.json
2024-05-01 10:40:35,732 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:40:35,740 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:40:37,536 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:40:54,141 - root - [INFO] - 	!!!Scores: {'accuracy': 0.509, 'average': 0.509}
2024-05-01 10:40:54,141 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:40:54,142 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_5/evaluation_runs.json
2024-05-01 10:40:54,142 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:40:54,150 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:40:55,996 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:41:12,227 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 10:41:12,227 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:41:12,227 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_6/evaluation_runs.json
2024-05-01 10:41:12,227 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:41:12,236 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:41:14,614 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:41:32,285 - root - [INFO] - 	!!!Scores: {'accuracy': 0.507, 'average': 0.507}
2024-05-01 10:41:32,285 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:41:32,285 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_7/evaluation_runs.json
2024-05-01 10:41:32,285 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:41:32,293 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:41:34,151 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:41:50,562 - root - [INFO] - 	!!!Scores: {'accuracy': 0.516, 'average': 0.516}
2024-05-01 10:41:50,562 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:41:50,562 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_8/evaluation_runs.json
2024-05-01 10:41:50,562 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:41:50,570 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:41:52,416 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:42:08,690 - root - [INFO] - 	!!!Scores: {'accuracy': 0.515, 'average': 0.515}
2024-05-01 10:42:08,690 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:42:08,690 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_9/evaluation_runs.json
2024-05-01 10:42:08,691 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:42:08,700 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:42:11,080 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:42:27,967 - root - [INFO] - 	!!!Scores: {'accuracy': 0.492, 'average': 0.492}
2024-05-01 10:42:27,967 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:42:27,967 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_10/evaluation_runs.json
2024-05-01 10:42:27,967 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:42:27,975 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:42:30,349 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:42:46,957 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 10:42:46,957 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:42:46,957 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_11/evaluation_runs.json
2024-05-01 10:42:46,958 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:42:46,965 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:42:48,780 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:43:05,053 - root - [INFO] - 	!!!Scores: {'accuracy': 0.517, 'average': 0.517}
2024-05-01 10:43:05,053 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:43:05,053 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_12/evaluation_runs.json
2024-05-01 10:43:05,054 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:43:05,061 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:43:07,422 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:43:23,985 - root - [INFO] - 	!!!Scores: {'accuracy': 0.494, 'average': 0.494}
2024-05-01 10:43:23,985 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:43:23,985 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_13/evaluation_runs.json
2024-05-01 10:43:23,985 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:43:23,994 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:43:26,347 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:43:43,734 - root - [INFO] - 	!!!Scores: {'accuracy': 0.498, 'average': 0.498}
2024-05-01 10:43:43,735 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 10:43:43,735 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_14/evaluation_runs.json
2024-05-01 10:43:43,735 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:43:43,743 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 10:43:45,597 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 10:44:01,868 - root - [INFO] - 	!!!Scores: {'accuracy': 0.511, 'average': 0.511}
2024-05-01 10:44:01,868 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:44:01,868 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_0/evaluation_runs.json
2024-05-01 10:44:01,869 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:44:12,576 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-5854c73e8e2a58bf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 10:44:12,635 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:44:14,804 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:0	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:44:40,074 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 10:44:40,074 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:44:40,074 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_1/evaluation_runs.json
2024-05-01 10:44:40,074 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:44:40,083 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:44:42,307 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:1	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:45:05,216 - root - [INFO] - 	!!!Scores: {'accuracy': 0.503, 'average': 0.503}
2024-05-01 10:45:05,216 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:45:05,216 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_2/evaluation_runs.json
2024-05-01 10:45:05,216 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:45:05,224 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:45:07,428 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:2	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:45:30,035 - root - [INFO] - 	!!!Scores: {'accuracy': 0.488, 'average': 0.488}
2024-05-01 10:45:30,035 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:45:30,035 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_3/evaluation_runs.json
2024-05-01 10:45:30,035 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:45:30,043 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:45:32,230 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:3	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:45:55,059 - root - [INFO] - 	!!!Scores: {'accuracy': 0.511, 'average': 0.511}
2024-05-01 10:45:55,059 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:45:55,059 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_4/evaluation_runs.json
2024-05-01 10:45:55,059 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:45:55,068 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:45:57,231 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:4	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:46:20,491 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 10:46:20,491 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:46:20,491 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_5/evaluation_runs.json
2024-05-01 10:46:20,491 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:46:20,499 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:46:22,688 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:5	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:46:45,454 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 10:46:45,454 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:46:45,454 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_6/evaluation_runs.json
2024-05-01 10:46:45,454 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:46:45,462 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:46:48,316 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:6	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:47:13,205 - root - [INFO] - 	!!!Scores: {'accuracy': 0.49, 'average': 0.49}
2024-05-01 10:47:13,206 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:47:13,206 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_7/evaluation_runs.json
2024-05-01 10:47:13,206 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:47:13,215 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:47:15,434 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:7	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:47:38,585 - root - [INFO] - 	!!!Scores: {'accuracy': 0.486, 'average': 0.486}
2024-05-01 10:47:38,585 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:47:38,585 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_8/evaluation_runs.json
2024-05-01 10:47:38,585 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:47:38,593 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:47:41,128 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:8	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:48:03,993 - root - [INFO] - 	!!!Scores: {'accuracy': 0.492, 'average': 0.492}
2024-05-01 10:48:03,993 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:48:03,993 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_9/evaluation_runs.json
2024-05-01 10:48:03,993 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:48:04,001 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:48:06,878 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:9	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:48:30,629 - root - [INFO] - 	!!!Scores: {'accuracy': 0.482, 'average': 0.482}
2024-05-01 10:48:30,629 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:48:30,629 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_10/evaluation_runs.json
2024-05-01 10:48:30,629 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:48:30,638 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:48:33,510 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:10	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:48:56,886 - root - [INFO] - 	!!!Scores: {'accuracy': 0.501, 'average': 0.501}
2024-05-01 10:48:56,886 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:48:56,886 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_11/evaluation_runs.json
2024-05-01 10:48:56,886 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:48:56,894 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:48:59,081 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:11	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:49:21,950 - root - [INFO] - 	!!!Scores: {'accuracy': 0.486, 'average': 0.486}
2024-05-01 10:49:21,950 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:49:21,950 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_12/evaluation_runs.json
2024-05-01 10:49:21,950 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:49:21,959 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:49:24,800 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:12	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:49:48,180 - root - [INFO] - 	!!!Scores: {'accuracy': 0.508, 'average': 0.508}
2024-05-01 10:49:48,180 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:49:48,181 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_13/evaluation_runs.json
2024-05-01 10:49:48,181 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:49:48,189 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:49:51,019 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:13	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:50:15,454 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 10:50:15,454 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 10:50:15,454 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_14/evaluation_runs.json
2024-05-01 10:50:15,454 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:50:15,462 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 10:50:17,699 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:14	Num Templates:15	Num Examples with Template:1200
2024-05-01 10:50:40,721 - root - [INFO] - 	!!!Scores: {'accuracy': 0.493, 'average': 0.493}
2024-05-01 10:50:40,797 - root - [INFO] - Unexpected keys: []
2024-05-01 10:50:41,033 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 10:50:41,033 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_0/evaluation_runs.json
2024-05-01 10:50:41,033 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:50:41,042 - root - [INFO] - 		Loading Full Data for rte
2024-05-01 10:50:41,723 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-667c257dd17a5fbb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 10:50:41,746 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 10:50:42,263 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:0	Num Templates:10	Num Examples with Template:245
2024-05-01 10:50:48,044 - root - [INFO] - 	!!!Scores: {'accuracy': 0.804, 'average': 0.804}
2024-05-01 10:50:48,044 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 10:50:48,044 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_1/evaluation_runs.json
2024-05-01 10:50:48,044 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:50:48,052 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 10:50:48,567 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:1	Num Templates:10	Num Examples with Template:245
2024-05-01 10:50:54,021 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 10:50:54,021 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 10:50:54,021 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_2/evaluation_runs.json
2024-05-01 10:50:54,021 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:50:54,029 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 10:50:54,544 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:2	Num Templates:10	Num Examples with Template:245
2024-05-01 10:50:59,993 - root - [INFO] - 	!!!Scores: {'accuracy': 0.812, 'average': 0.812}
2024-05-01 10:50:59,993 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 10:50:59,993 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_3/evaluation_runs.json
2024-05-01 10:50:59,993 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:51:00,001 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 10:51:00,512 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:3	Num Templates:10	Num Examples with Template:245
2024-05-01 10:51:05,889 - root - [INFO] - 	!!!Scores: {'accuracy': 0.788, 'average': 0.788}
2024-05-01 10:51:05,889 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 10:51:05,890 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_4/evaluation_runs.json
2024-05-01 10:51:05,890 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:51:05,897 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 10:51:06,409 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:4	Num Templates:10	Num Examples with Template:245
2024-05-01 10:51:11,857 - root - [INFO] - 	!!!Scores: {'accuracy': 0.8, 'average': 0.8}
2024-05-01 10:51:11,857 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 10:51:11,857 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_5/evaluation_runs.json
2024-05-01 10:51:11,858 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:51:11,865 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 10:51:12,381 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:5	Num Templates:10	Num Examples with Template:245
2024-05-01 10:51:17,835 - root - [INFO] - 	!!!Scores: {'accuracy': 0.796, 'average': 0.796}
2024-05-01 10:51:17,835 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 10:51:17,835 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_6/evaluation_runs.json
2024-05-01 10:51:17,835 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:51:17,843 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 10:51:18,358 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:6	Num Templates:10	Num Examples with Template:245
2024-05-01 10:51:23,747 - root - [INFO] - 	!!!Scores: {'accuracy': 0.78, 'average': 0.78}
2024-05-01 10:51:23,747 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 10:51:23,747 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_7/evaluation_runs.json
2024-05-01 10:51:23,748 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:51:23,755 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 10:51:24,268 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:7	Num Templates:10	Num Examples with Template:245
2024-05-01 10:51:29,853 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 10:51:29,853 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 10:51:29,853 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_8/evaluation_runs.json
2024-05-01 10:51:29,853 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:51:29,861 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 10:51:30,373 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:8	Num Templates:10	Num Examples with Template:245
2024-05-01 10:51:35,819 - root - [INFO] - 	!!!Scores: {'accuracy': 0.796, 'average': 0.796}
2024-05-01 10:51:35,819 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 10:51:35,820 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_9/evaluation_runs.json
2024-05-01 10:51:35,820 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:51:35,827 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 10:51:36,345 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:9	Num Templates:10	Num Examples with Template:245
2024-05-01 10:51:41,890 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 10:51:41,890 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:51:41,890 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_0/evaluation_runs.json
2024-05-01 10:51:41,891 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:51:41,898 - root - [INFO] - 		Loading Full Data for cb
2024-05-01 10:51:42,827 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-c26e73467e3f215e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 10:51:42,835 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:51:42,900 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:0	Num Templates:15	Num Examples with Template:24
2024-05-01 10:51:44,345 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 10:51:44,345 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:51:44,345 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_1/evaluation_runs.json
2024-05-01 10:51:44,345 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:51:44,353 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:51:44,404 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:1	Num Templates:15	Num Examples with Template:24
2024-05-01 10:51:45,852 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 10:51:45,852 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:51:45,852 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_2/evaluation_runs.json
2024-05-01 10:51:45,852 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:51:45,860 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:51:45,924 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:2	Num Templates:15	Num Examples with Template:24
2024-05-01 10:51:47,392 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 10:51:47,392 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:51:47,392 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_3/evaluation_runs.json
2024-05-01 10:51:47,393 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:51:47,400 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:51:47,451 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:3	Num Templates:15	Num Examples with Template:24
2024-05-01 10:51:48,888 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 10:51:48,888 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:51:48,889 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_4/evaluation_runs.json
2024-05-01 10:51:48,889 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:51:48,896 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:51:48,947 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:4	Num Templates:15	Num Examples with Template:24
2024-05-01 10:51:50,389 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 10:51:50,389 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:51:50,389 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_5/evaluation_runs.json
2024-05-01 10:51:50,389 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:51:50,397 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:51:50,461 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:5	Num Templates:15	Num Examples with Template:24
2024-05-01 10:51:51,912 - root - [INFO] - 	!!!Scores: {'accuracy': 0.917, 'average': 0.917}
2024-05-01 10:51:51,912 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:51:51,912 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_6/evaluation_runs.json
2024-05-01 10:51:51,912 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:51:51,920 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:51:51,971 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:6	Num Templates:15	Num Examples with Template:24
2024-05-01 10:51:53,410 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 10:51:53,410 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:51:53,410 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_7/evaluation_runs.json
2024-05-01 10:51:53,410 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:51:53,418 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:51:53,482 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:7	Num Templates:15	Num Examples with Template:24
2024-05-01 10:51:54,932 - root - [INFO] - 	!!!Scores: {'accuracy': 0.75, 'average': 0.75}
2024-05-01 10:51:54,932 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:51:54,932 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_8/evaluation_runs.json
2024-05-01 10:51:54,933 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:51:54,940 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:51:54,991 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:8	Num Templates:15	Num Examples with Template:24
2024-05-01 10:51:56,437 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 10:51:56,437 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:51:56,437 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_9/evaluation_runs.json
2024-05-01 10:51:56,437 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:51:56,445 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:51:56,497 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:9	Num Templates:15	Num Examples with Template:24
2024-05-01 10:51:57,943 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 10:51:57,943 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:51:57,943 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_10/evaluation_runs.json
2024-05-01 10:51:57,943 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:51:57,951 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:51:58,015 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:10	Num Templates:15	Num Examples with Template:24
2024-05-01 10:51:59,472 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 10:51:59,472 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:51:59,472 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_11/evaluation_runs.json
2024-05-01 10:51:59,472 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:51:59,480 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:51:59,530 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:11	Num Templates:15	Num Examples with Template:24
2024-05-01 10:52:00,974 - root - [INFO] - 	!!!Scores: {'accuracy': 0.75, 'average': 0.75}
2024-05-01 10:52:00,974 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:52:00,974 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_12/evaluation_runs.json
2024-05-01 10:52:00,974 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:52:00,982 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:52:01,033 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:12	Num Templates:15	Num Examples with Template:24
2024-05-01 10:52:02,520 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 10:52:02,520 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:52:02,520 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_13/evaluation_runs.json
2024-05-01 10:52:02,520 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:52:02,528 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:52:02,579 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:13	Num Templates:15	Num Examples with Template:24
2024-05-01 10:52:04,022 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 10:52:04,022 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 10:52:04,022 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_14/evaluation_runs.json
2024-05-01 10:52:04,022 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:52:04,030 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 10:52:04,094 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:14	Num Templates:15	Num Examples with Template:24
2024-05-01 10:52:05,572 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 10:52:05,572 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 10:52:05,572 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_0/evaluation_runs.json
2024-05-01 10:52:05,572 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:52:06,500 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-3ae7a9f52719d86a/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 10:52:06,555 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 10:52:10,147 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:0	Num Templates:5	Num Examples with Template:1235
2024-05-01 10:52:19,787 - root - [INFO] - 	!!!Scores: {'accuracy': 0.715, 'average': 0.715}
2024-05-01 10:52:19,787 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 10:52:19,787 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_1/evaluation_runs.json
2024-05-01 10:52:19,787 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:52:19,796 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 10:52:23,328 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:1	Num Templates:5	Num Examples with Template:1235
2024-05-01 10:52:33,041 - root - [INFO] - 	!!!Scores: {'accuracy': 0.698, 'average': 0.698}
2024-05-01 10:52:33,041 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 10:52:33,041 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_2/evaluation_runs.json
2024-05-01 10:52:33,041 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:52:33,049 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 10:52:36,648 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:2	Num Templates:5	Num Examples with Template:1235
2024-05-01 10:52:46,435 - root - [INFO] - 	!!!Scores: {'accuracy': 0.697, 'average': 0.697}
2024-05-01 10:52:46,435 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 10:52:46,435 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_3/evaluation_runs.json
2024-05-01 10:52:46,435 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:52:46,443 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 10:52:50,081 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:3	Num Templates:5	Num Examples with Template:1235
2024-05-01 10:53:00,188 - root - [INFO] - 	!!!Scores: {'accuracy': 0.696, 'average': 0.696}
2024-05-01 10:53:00,188 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 10:53:00,188 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_4/evaluation_runs.json
2024-05-01 10:53:00,188 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:53:00,196 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 10:53:03,799 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:4	Num Templates:5	Num Examples with Template:1235
2024-05-01 10:53:13,794 - root - [INFO] - 	!!!Scores: {'accuracy': 0.7, 'average': 0.7}
2024-05-01 10:53:13,794 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 10:53:13,794 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_0/evaluation_runs.json
2024-05-01 10:53:13,794 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:53:13,802 - root - [INFO] - 		Loading Full Data for wic
2024-05-01 10:53:14,742 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-8f888d9273347dcb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 10:53:14,794 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 10:53:16,298 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:0	Num Templates:10	Num Examples with Template:606
2024-05-01 10:53:21,626 - root - [INFO] - 	!!!Scores: {'accuracy': 0.645, 'average': 0.645}
2024-05-01 10:53:21,626 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 10:53:21,626 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_1/evaluation_runs.json
2024-05-01 10:53:21,626 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:53:21,634 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 10:53:23,125 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:1	Num Templates:10	Num Examples with Template:606
2024-05-01 10:53:28,316 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 10:53:28,316 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 10:53:28,316 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_2/evaluation_runs.json
2024-05-01 10:53:28,316 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:53:28,325 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 10:53:29,823 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:2	Num Templates:10	Num Examples with Template:606
2024-05-01 10:53:35,530 - root - [INFO] - 	!!!Scores: {'accuracy': 0.637, 'average': 0.637}
2024-05-01 10:53:35,530 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 10:53:35,530 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_3/evaluation_runs.json
2024-05-01 10:53:35,530 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:53:35,538 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 10:53:37,030 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:3	Num Templates:10	Num Examples with Template:606
2024-05-01 10:53:42,862 - root - [INFO] - 	!!!Scores: {'accuracy': 0.536, 'average': 0.536}
2024-05-01 10:53:42,862 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 10:53:42,862 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_4/evaluation_runs.json
2024-05-01 10:53:42,862 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:53:42,870 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 10:53:44,349 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:4	Num Templates:10	Num Examples with Template:606
2024-05-01 10:53:49,767 - root - [INFO] - 	!!!Scores: {'accuracy': 0.553, 'average': 0.553}
2024-05-01 10:53:49,767 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 10:53:49,767 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_5/evaluation_runs.json
2024-05-01 10:53:49,767 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:53:49,775 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 10:53:51,273 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:5	Num Templates:10	Num Examples with Template:606
2024-05-01 10:53:57,112 - root - [INFO] - 	!!!Scores: {'accuracy': 0.66, 'average': 0.66}
2024-05-01 10:53:57,112 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 10:53:57,112 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_6/evaluation_runs.json
2024-05-01 10:53:57,112 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:53:57,120 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 10:53:58,621 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:6	Num Templates:10	Num Examples with Template:606
2024-05-01 10:54:04,035 - root - [INFO] - 	!!!Scores: {'accuracy': 0.665, 'average': 0.665}
2024-05-01 10:54:04,035 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 10:54:04,035 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_7/evaluation_runs.json
2024-05-01 10:54:04,035 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:54:04,043 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 10:54:05,520 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:7	Num Templates:10	Num Examples with Template:606
2024-05-01 10:54:11,087 - root - [INFO] - 	!!!Scores: {'accuracy': 0.523, 'average': 0.523}
2024-05-01 10:54:11,087 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 10:54:11,087 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_8/evaluation_runs.json
2024-05-01 10:54:11,087 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:54:11,095 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 10:54:12,590 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:8	Num Templates:10	Num Examples with Template:606
2024-05-01 10:54:18,673 - root - [INFO] - 	!!!Scores: {'accuracy': 0.635, 'average': 0.635}
2024-05-01 10:54:18,673 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 10:54:18,673 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_9/evaluation_runs.json
2024-05-01 10:54:18,673 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:54:18,681 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 10:54:20,154 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:9	Num Templates:10	Num Examples with Template:606
2024-05-01 10:54:24,888 - root - [INFO] - 	!!!Scores: {'accuracy': 0.647, 'average': 0.647}
2024-05-01 10:54:24,888 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 10:54:24,888 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_0/evaluation_runs.json
2024-05-01 10:54:24,889 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:54:24,896 - root - [INFO] - 		Loading Full Data for wsc
2024-05-01 10:54:25,796 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-4bcfdb1f33f6e278/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 10:54:25,812 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 10:54:26,001 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:0	Num Templates:10	Num Examples with Template:72
2024-05-01 10:54:27,792 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 10:54:27,792 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 10:54:27,793 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_1/evaluation_runs.json
2024-05-01 10:54:27,793 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:54:27,800 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 10:54:27,973 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:1	Num Templates:10	Num Examples with Template:72
2024-05-01 10:54:29,735 - root - [INFO] - 	!!!Scores: {'accuracy': 0.542, 'average': 0.542}
2024-05-01 10:54:29,735 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 10:54:29,735 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_2/evaluation_runs.json
2024-05-01 10:54:29,735 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:54:29,743 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 10:54:29,959 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:2	Num Templates:10	Num Examples with Template:72
2024-05-01 10:54:31,885 - root - [INFO] - 	!!!Scores: {'accuracy': 0.722, 'average': 0.722}
2024-05-01 10:54:31,885 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 10:54:31,885 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_3/evaluation_runs.json
2024-05-01 10:54:31,885 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:54:31,893 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 10:54:32,108 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:3	Num Templates:10	Num Examples with Template:72
2024-05-01 10:54:34,037 - root - [INFO] - 	!!!Scores: {'accuracy': 0.667, 'average': 0.667}
2024-05-01 10:54:34,038 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 10:54:34,038 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_4/evaluation_runs.json
2024-05-01 10:54:34,038 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:54:34,045 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 10:54:34,228 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:4	Num Templates:10	Num Examples with Template:72
2024-05-01 10:54:35,981 - root - [INFO] - 	!!!Scores: {'accuracy': 0.556, 'average': 0.556}
2024-05-01 10:54:35,981 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 10:54:35,981 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_5/evaluation_runs.json
2024-05-01 10:54:35,981 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:54:35,989 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 10:54:36,164 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:5	Num Templates:10	Num Examples with Template:72
2024-05-01 10:54:37,966 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 10:54:37,966 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 10:54:37,966 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_6/evaluation_runs.json
2024-05-01 10:54:37,966 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:54:37,974 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 10:54:38,146 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:6	Num Templates:10	Num Examples with Template:72
2024-05-01 10:54:39,957 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 10:54:39,957 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 10:54:39,957 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_7/evaluation_runs.json
2024-05-01 10:54:39,957 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:54:39,965 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 10:54:40,237 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:7	Num Templates:10	Num Examples with Template:72
2024-05-01 10:54:42,008 - root - [INFO] - 	!!!Scores: {'accuracy': 0.514, 'average': 0.514}
2024-05-01 10:54:42,008 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 10:54:42,008 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_8/evaluation_runs.json
2024-05-01 10:54:42,008 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:54:42,016 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 10:54:42,189 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:8	Num Templates:10	Num Examples with Template:72
2024-05-01 10:54:43,985 - root - [INFO] - 	!!!Scores: {'accuracy': 0.667, 'average': 0.667}
2024-05-01 10:54:43,986 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 10:54:43,986 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_9/evaluation_runs.json
2024-05-01 10:54:43,986 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:54:43,993 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 10:54:44,282 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:9	Num Templates:10	Num Examples with Template:72
2024-05-01 10:54:46,044 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 10:54:46,044 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 10:54:46,044 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_0/evaluation_runs.json
2024-05-01 10:54:46,045 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:54:46,052 - root - [INFO] - 		Loading Full Data for copa
2024-05-01 10:54:46,957 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-75ee279101665e7b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 10:54:46,972 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 10:54:47,188 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:0	Num Templates:8	Num Examples with Template:68
2024-05-01 10:54:48,775 - root - [INFO] - 	!!!Scores: {'accuracy': 0.882, 'average': 0.882}
2024-05-01 10:54:48,775 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 10:54:48,775 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_1/evaluation_runs.json
2024-05-01 10:54:48,775 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:54:48,783 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 10:54:48,990 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:1	Num Templates:8	Num Examples with Template:68
2024-05-01 10:54:50,608 - root - [INFO] - 	!!!Scores: {'accuracy': 0.824, 'average': 0.824}
2024-05-01 10:54:50,608 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 10:54:50,608 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_2/evaluation_runs.json
2024-05-01 10:54:50,608 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:54:50,616 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 10:54:50,823 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:2	Num Templates:8	Num Examples with Template:68
2024-05-01 10:54:52,420 - root - [INFO] - 	!!!Scores: {'accuracy': 0.926, 'average': 0.926}
2024-05-01 10:54:52,420 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 10:54:52,420 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_3/evaluation_runs.json
2024-05-01 10:54:52,420 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:54:52,427 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 10:54:52,650 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:3	Num Templates:8	Num Examples with Template:68
2024-05-01 10:54:54,181 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 10:54:54,181 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 10:54:54,181 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_4/evaluation_runs.json
2024-05-01 10:54:54,182 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:54:54,189 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 10:54:54,396 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:4	Num Templates:8	Num Examples with Template:68
2024-05-01 10:54:55,994 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 10:54:55,994 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 10:54:55,994 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_5/evaluation_runs.json
2024-05-01 10:54:55,994 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:54:56,002 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 10:54:56,211 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:5	Num Templates:8	Num Examples with Template:68
2024-05-01 10:54:57,817 - root - [INFO] - 	!!!Scores: {'accuracy': 0.853, 'average': 0.853}
2024-05-01 10:54:57,817 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 10:54:57,817 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_6/evaluation_runs.json
2024-05-01 10:54:57,817 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:54:57,825 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 10:54:58,032 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:6	Num Templates:8	Num Examples with Template:68
2024-05-01 10:54:59,591 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 10:54:59,591 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 10:54:59,591 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_7/evaluation_runs.json
2024-05-01 10:54:59,591 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:54:59,599 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 10:54:59,806 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:7	Num Templates:8	Num Examples with Template:68
2024-05-01 10:55:01,363 - root - [INFO] - 	!!!Scores: {'accuracy': 0.882, 'average': 0.882}
2024-05-01 10:55:01,363 - root - [INFO] - 	Evaluating model on h-swag dataset
2024-05-01 10:55:01,363 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/h-swag_template_0/evaluation_runs.json
2024-05-01 10:55:01,363 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 10:55:01,377 - root - [INFO] - 		Loading Full Data for h-swag
2024-05-01 10:55:02,308 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-eac19eb96b26d7b6/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 10:55:02,991 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Selected Examples: 10010	Num Total Example:10010
2024-05-01 10:55:24,775 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Num Selected Example with Templates:10010	Template Idx:0	Num Templates:1	Num Examples with Template:10010
2024-05-01 11:00:38,294 - root - [INFO] - 	!!!Scores: {'accuracy': 0.426, 'average': 0.426}
2024-05-01 11:00:38,295 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 11:00:38,295 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_0/evaluation_runs.json
2024-05-01 11:00:38,295 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:00:39,212 - datasets.builder - [WARNING] - Found cached dataset arrow (/home/guodong/.cache/huggingface/datasets/arrow/default-2796e0ea53bfdf30/0.0.0/74f69db2c14c2860059d39860b1f400a03d11bf7fb5a8258ca38c501c878c137)
2024-05-01 11:00:39,323 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 11:00:45,332 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:0	Num Templates:5	Num Examples with Template:1839
2024-05-01 11:01:09,642 - root - [INFO] - 	!!!Scores: {'accuracy': 0.919, 'average': 0.919}
2024-05-01 11:01:09,642 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 11:01:09,642 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_1/evaluation_runs.json
2024-05-01 11:01:09,642 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:01:09,651 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 11:01:15,726 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:1	Num Templates:5	Num Examples with Template:1839
2024-05-01 11:01:40,472 - root - [INFO] - 	!!!Scores: {'accuracy': 0.922, 'average': 0.922}
2024-05-01 11:01:40,472 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 11:01:40,472 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_2/evaluation_runs.json
2024-05-01 11:01:40,472 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:01:40,480 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 11:01:46,510 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:2	Num Templates:5	Num Examples with Template:1839
2024-05-01 11:02:11,114 - root - [INFO] - 	!!!Scores: {'accuracy': 0.925, 'average': 0.925}
2024-05-01 11:02:11,115 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 11:02:11,115 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_3/evaluation_runs.json
2024-05-01 11:02:11,115 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:02:11,124 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 11:02:17,193 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:3	Num Templates:5	Num Examples with Template:1839
2024-05-01 11:02:41,815 - root - [INFO] - 	!!!Scores: {'accuracy': 0.921, 'average': 0.921}
2024-05-01 11:02:41,815 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 11:02:41,815 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_4/evaluation_runs.json
2024-05-01 11:02:41,815 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:02:41,823 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 11:02:47,863 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:4	Num Templates:5	Num Examples with Template:1839
2024-05-01 11:03:13,165 - root - [INFO] - 	!!!Scores: {'accuracy': 0.918, 'average': 0.918}
2024-05-01 11:03:13,165 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:03:13,165 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_0/evaluation_runs.json
2024-05-01 11:03:13,165 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:03:14,116 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-9d580cad42f0d988/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 11:03:14,169 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:03:15,998 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:03:34,487 - root - [INFO] - 	!!!Scores: {'accuracy': 0.628, 'average': 0.628}
2024-05-01 11:03:34,488 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:03:34,488 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_1/evaluation_runs.json
2024-05-01 11:03:34,488 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:03:34,495 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:03:36,341 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:03:52,894 - root - [INFO] - 	!!!Scores: {'accuracy': 0.671, 'average': 0.671}
2024-05-01 11:03:52,895 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:03:52,895 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_2/evaluation_runs.json
2024-05-01 11:03:52,895 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:03:52,903 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:03:54,724 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:04:10,994 - root - [INFO] - 	!!!Scores: {'accuracy': 0.673, 'average': 0.673}
2024-05-01 11:04:10,994 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:04:10,994 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_3/evaluation_runs.json
2024-05-01 11:04:10,994 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:04:11,002 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:04:12,808 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:04:29,248 - root - [INFO] - 	!!!Scores: {'accuracy': 0.662, 'average': 0.662}
2024-05-01 11:04:29,248 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:04:29,248 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_4/evaluation_runs.json
2024-05-01 11:04:29,248 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:04:29,257 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:04:31,061 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:04:47,857 - root - [INFO] - 	!!!Scores: {'accuracy': 0.679, 'average': 0.679}
2024-05-01 11:04:47,857 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:04:47,857 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_5/evaluation_runs.json
2024-05-01 11:04:47,857 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:04:47,865 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:04:49,690 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:05:06,170 - root - [INFO] - 	!!!Scores: {'accuracy': 0.674, 'average': 0.674}
2024-05-01 11:05:06,170 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:05:06,170 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_6/evaluation_runs.json
2024-05-01 11:05:06,170 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:05:06,178 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:05:08,559 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:05:26,713 - root - [INFO] - 	!!!Scores: {'accuracy': 0.646, 'average': 0.646}
2024-05-01 11:05:26,713 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:05:26,713 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_7/evaluation_runs.json
2024-05-01 11:05:26,714 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:05:26,722 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:05:28,570 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:05:45,288 - root - [INFO] - 	!!!Scores: {'accuracy': 0.661, 'average': 0.661}
2024-05-01 11:05:45,288 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:05:45,288 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_8/evaluation_runs.json
2024-05-01 11:05:45,288 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:05:45,296 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:05:47,151 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:06:03,694 - root - [INFO] - 	!!!Scores: {'accuracy': 0.674, 'average': 0.674}
2024-05-01 11:06:03,695 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:06:03,695 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_9/evaluation_runs.json
2024-05-01 11:06:03,695 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:06:03,707 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:06:06,091 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:06:23,274 - root - [INFO] - 	!!!Scores: {'accuracy': 0.622, 'average': 0.622}
2024-05-01 11:06:23,274 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:06:23,274 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_10/evaluation_runs.json
2024-05-01 11:06:23,274 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:06:23,282 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:06:25,657 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:06:42,525 - root - [INFO] - 	!!!Scores: {'accuracy': 0.649, 'average': 0.649}
2024-05-01 11:06:42,525 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:06:42,525 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_11/evaluation_runs.json
2024-05-01 11:06:42,525 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:06:42,534 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:06:44,340 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:07:00,881 - root - [INFO] - 	!!!Scores: {'accuracy': 0.67, 'average': 0.67}
2024-05-01 11:07:00,881 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:07:00,881 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_12/evaluation_runs.json
2024-05-01 11:07:00,881 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:07:00,889 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:07:03,262 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:07:20,146 - root - [INFO] - 	!!!Scores: {'accuracy': 0.624, 'average': 0.624}
2024-05-01 11:07:20,146 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:07:20,146 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_13/evaluation_runs.json
2024-05-01 11:07:20,146 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:07:20,154 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:07:22,504 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:07:40,255 - root - [INFO] - 	!!!Scores: {'accuracy': 0.654, 'average': 0.654}
2024-05-01 11:07:40,255 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:07:40,255 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_14/evaluation_runs.json
2024-05-01 11:07:40,255 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:07:40,263 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:07:42,106 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:07:58,648 - root - [INFO] - 	!!!Scores: {'accuracy': 0.667, 'average': 0.667}
2024-05-01 11:07:58,648 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:07:58,648 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_0/evaluation_runs.json
2024-05-01 11:07:58,648 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:07:59,583 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-2f5e309763d7971f/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 11:07:59,636 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:08:01,450 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:08:19,391 - root - [INFO] - 	!!!Scores: {'accuracy': 0.505, 'average': 0.505}
2024-05-01 11:08:19,391 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:08:19,392 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_1/evaluation_runs.json
2024-05-01 11:08:19,392 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:08:19,399 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:08:21,251 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:08:37,513 - root - [INFO] - 	!!!Scores: {'accuracy': 0.524, 'average': 0.524}
2024-05-01 11:08:37,514 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:08:37,514 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_2/evaluation_runs.json
2024-05-01 11:08:37,514 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:08:37,522 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:08:39,350 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:08:55,219 - root - [INFO] - 	!!!Scores: {'accuracy': 0.52, 'average': 0.52}
2024-05-01 11:08:55,219 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:08:55,219 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_3/evaluation_runs.json
2024-05-01 11:08:55,219 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:08:55,228 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:08:57,027 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:09:13,083 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 11:09:13,084 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:09:13,084 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_4/evaluation_runs.json
2024-05-01 11:09:13,084 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:09:13,092 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:09:14,882 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:09:31,352 - root - [INFO] - 	!!!Scores: {'accuracy': 0.51, 'average': 0.51}
2024-05-01 11:09:31,352 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:09:31,352 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_5/evaluation_runs.json
2024-05-01 11:09:31,353 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:09:31,360 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:09:33,171 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:09:49,289 - root - [INFO] - 	!!!Scores: {'accuracy': 0.525, 'average': 0.525}
2024-05-01 11:09:49,289 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:09:49,289 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_6/evaluation_runs.json
2024-05-01 11:09:49,289 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:09:49,297 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:09:51,674 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:10:09,331 - root - [INFO] - 	!!!Scores: {'accuracy': 0.51, 'average': 0.51}
2024-05-01 11:10:09,331 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:10:09,332 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_7/evaluation_runs.json
2024-05-01 11:10:09,332 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:10:09,340 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:10:11,183 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:10:27,588 - root - [INFO] - 	!!!Scores: {'accuracy': 0.514, 'average': 0.514}
2024-05-01 11:10:27,588 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:10:27,588 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_8/evaluation_runs.json
2024-05-01 11:10:27,588 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:10:27,596 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:10:29,430 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:10:45,681 - root - [INFO] - 	!!!Scores: {'accuracy': 0.517, 'average': 0.517}
2024-05-01 11:10:45,681 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:10:45,681 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_9/evaluation_runs.json
2024-05-01 11:10:45,681 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:10:45,689 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:10:48,063 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:11:04,928 - root - [INFO] - 	!!!Scores: {'accuracy': 0.495, 'average': 0.495}
2024-05-01 11:11:04,928 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:11:04,928 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_10/evaluation_runs.json
2024-05-01 11:11:04,929 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:11:04,936 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:11:07,309 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:11:23,892 - root - [INFO] - 	!!!Scores: {'accuracy': 0.504, 'average': 0.504}
2024-05-01 11:11:23,892 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:11:23,892 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_11/evaluation_runs.json
2024-05-01 11:11:23,893 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:11:23,901 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:11:25,699 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:11:41,958 - root - [INFO] - 	!!!Scores: {'accuracy': 0.523, 'average': 0.523}
2024-05-01 11:11:41,959 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:11:41,959 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_12/evaluation_runs.json
2024-05-01 11:11:41,959 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:11:41,967 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:11:44,322 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:12:00,879 - root - [INFO] - 	!!!Scores: {'accuracy': 0.501, 'average': 0.501}
2024-05-01 11:12:00,879 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:12:00,879 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_13/evaluation_runs.json
2024-05-01 11:12:00,879 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:12:00,887 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:12:03,436 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:12:20,826 - root - [INFO] - 	!!!Scores: {'accuracy': 0.499, 'average': 0.499}
2024-05-01 11:12:20,826 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:12:20,826 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_14/evaluation_runs.json
2024-05-01 11:12:20,826 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:12:20,834 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:12:22,686 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:12:38,947 - root - [INFO] - 	!!!Scores: {'accuracy': 0.509, 'average': 0.509}
2024-05-01 11:12:38,948 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:12:38,948 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_0/evaluation_runs.json
2024-05-01 11:12:38,948 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:12:39,865 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-5854c73e8e2a58bf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 11:12:39,926 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:12:42,085 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:0	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:13:07,393 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 11:13:07,393 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:13:07,394 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_1/evaluation_runs.json
2024-05-01 11:13:07,394 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:13:07,402 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:13:09,605 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:1	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:13:32,472 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 11:13:32,472 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:13:32,472 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_2/evaluation_runs.json
2024-05-01 11:13:32,472 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:13:32,480 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:13:34,655 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:2	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:13:57,154 - root - [INFO] - 	!!!Scores: {'accuracy': 0.483, 'average': 0.483}
2024-05-01 11:13:57,155 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:13:57,155 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_3/evaluation_runs.json
2024-05-01 11:13:57,155 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:13:57,164 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:13:59,321 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:3	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:14:22,006 - root - [INFO] - 	!!!Scores: {'accuracy': 0.507, 'average': 0.507}
2024-05-01 11:14:22,006 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:14:22,006 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_4/evaluation_runs.json
2024-05-01 11:14:22,006 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:14:22,014 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:14:24,167 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:4	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:14:47,422 - root - [INFO] - 	!!!Scores: {'accuracy': 0.489, 'average': 0.489}
2024-05-01 11:14:47,423 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:14:47,423 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_5/evaluation_runs.json
2024-05-01 11:14:47,423 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:14:47,431 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:14:49,608 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:5	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:15:12,475 - root - [INFO] - 	!!!Scores: {'accuracy': 0.498, 'average': 0.498}
2024-05-01 11:15:12,475 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:15:12,475 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_6/evaluation_runs.json
2024-05-01 11:15:12,475 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:15:12,484 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:15:15,339 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:6	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:15:40,308 - root - [INFO] - 	!!!Scores: {'accuracy': 0.489, 'average': 0.489}
2024-05-01 11:15:40,308 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:15:40,308 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_7/evaluation_runs.json
2024-05-01 11:15:40,308 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:15:40,316 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:15:42,519 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:7	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:16:05,672 - root - [INFO] - 	!!!Scores: {'accuracy': 0.479, 'average': 0.479}
2024-05-01 11:16:05,673 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:16:05,673 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_8/evaluation_runs.json
2024-05-01 11:16:05,673 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:16:05,681 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:16:07,895 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:8	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:16:30,746 - root - [INFO] - 	!!!Scores: {'accuracy': 0.485, 'average': 0.485}
2024-05-01 11:16:30,746 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:16:30,746 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_9/evaluation_runs.json
2024-05-01 11:16:30,746 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:16:30,754 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:16:33,602 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:9	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:16:57,326 - root - [INFO] - 	!!!Scores: {'accuracy': 0.49, 'average': 0.49}
2024-05-01 11:16:57,326 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:16:57,326 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_10/evaluation_runs.json
2024-05-01 11:16:57,326 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:16:57,334 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:17:00,171 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:10	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:17:23,539 - root - [INFO] - 	!!!Scores: {'accuracy': 0.505, 'average': 0.505}
2024-05-01 11:17:23,540 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:17:23,540 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_11/evaluation_runs.json
2024-05-01 11:17:23,540 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:17:23,549 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:17:25,710 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:11	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:17:48,562 - root - [INFO] - 	!!!Scores: {'accuracy': 0.487, 'average': 0.487}
2024-05-01 11:17:48,562 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:17:48,562 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_12/evaluation_runs.json
2024-05-01 11:17:48,562 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:17:48,570 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:17:51,411 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:12	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:18:14,782 - root - [INFO] - 	!!!Scores: {'accuracy': 0.513, 'average': 0.513}
2024-05-01 11:18:14,782 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:18:14,782 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_13/evaluation_runs.json
2024-05-01 11:18:14,782 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:18:14,790 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:18:17,609 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:13	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:18:42,019 - root - [INFO] - 	!!!Scores: {'accuracy': 0.493, 'average': 0.493}
2024-05-01 11:18:42,019 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:18:42,019 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_14/evaluation_runs.json
2024-05-01 11:18:42,019 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:18:42,028 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:18:44,240 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:14	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:19:07,096 - root - [INFO] - 	!!!Scores: {'accuracy': 0.487, 'average': 0.487}
2024-05-01 11:19:07,172 - root - [INFO] - Unexpected keys: []
2024-05-01 11:19:07,401 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 11:19:07,402 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_0/evaluation_runs.json
2024-05-01 11:19:07,402 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:19:07,410 - root - [INFO] - 		Loading Full Data for rte
2024-05-01 11:19:08,326 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-667c257dd17a5fbb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 11:19:08,349 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 11:19:08,864 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:0	Num Templates:10	Num Examples with Template:245
2024-05-01 11:19:14,619 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 11:19:14,619 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 11:19:14,620 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_1/evaluation_runs.json
2024-05-01 11:19:14,620 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:19:14,627 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 11:19:15,144 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:1	Num Templates:10	Num Examples with Template:245
2024-05-01 11:19:20,593 - root - [INFO] - 	!!!Scores: {'accuracy': 0.776, 'average': 0.776}
2024-05-01 11:19:20,593 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 11:19:20,593 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_2/evaluation_runs.json
2024-05-01 11:19:20,593 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:19:20,601 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 11:19:21,126 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:2	Num Templates:10	Num Examples with Template:245
2024-05-01 11:19:26,576 - root - [INFO] - 	!!!Scores: {'accuracy': 0.812, 'average': 0.812}
2024-05-01 11:19:26,577 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 11:19:26,577 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_3/evaluation_runs.json
2024-05-01 11:19:26,577 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:19:26,585 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 11:19:27,097 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:3	Num Templates:10	Num Examples with Template:245
2024-05-01 11:19:32,479 - root - [INFO] - 	!!!Scores: {'accuracy': 0.788, 'average': 0.788}
2024-05-01 11:19:32,479 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 11:19:32,479 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_4/evaluation_runs.json
2024-05-01 11:19:32,479 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:19:32,487 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 11:19:33,000 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:4	Num Templates:10	Num Examples with Template:245
2024-05-01 11:19:38,448 - root - [INFO] - 	!!!Scores: {'accuracy': 0.804, 'average': 0.804}
2024-05-01 11:19:38,449 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 11:19:38,449 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_5/evaluation_runs.json
2024-05-01 11:19:38,449 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:19:38,456 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 11:19:38,973 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:5	Num Templates:10	Num Examples with Template:245
2024-05-01 11:19:44,425 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 11:19:44,425 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 11:19:44,425 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_6/evaluation_runs.json
2024-05-01 11:19:44,426 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:19:44,434 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 11:19:44,952 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:6	Num Templates:10	Num Examples with Template:245
2024-05-01 11:19:50,336 - root - [INFO] - 	!!!Scores: {'accuracy': 0.771, 'average': 0.771}
2024-05-01 11:19:50,336 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 11:19:50,336 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_7/evaluation_runs.json
2024-05-01 11:19:50,337 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:19:50,345 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 11:19:50,858 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:7	Num Templates:10	Num Examples with Template:245
2024-05-01 11:19:56,440 - root - [INFO] - 	!!!Scores: {'accuracy': 0.776, 'average': 0.776}
2024-05-01 11:19:56,440 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 11:19:56,440 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_8/evaluation_runs.json
2024-05-01 11:19:56,440 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:19:56,449 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 11:19:56,961 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:8	Num Templates:10	Num Examples with Template:245
2024-05-01 11:20:02,406 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 11:20:02,406 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 11:20:02,406 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_9/evaluation_runs.json
2024-05-01 11:20:02,406 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:20:02,414 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 11:20:02,932 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:9	Num Templates:10	Num Examples with Template:245
2024-05-01 11:20:08,475 - root - [INFO] - 	!!!Scores: {'accuracy': 0.788, 'average': 0.788}
2024-05-01 11:20:08,475 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:20:08,475 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_0/evaluation_runs.json
2024-05-01 11:20:08,475 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:20:08,483 - root - [INFO] - 		Loading Full Data for cb
2024-05-01 11:20:09,173 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-c26e73467e3f215e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 11:20:09,181 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:20:09,245 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:0	Num Templates:15	Num Examples with Template:24
2024-05-01 11:20:10,690 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 11:20:10,690 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:20:10,690 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_1/evaluation_runs.json
2024-05-01 11:20:10,690 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:20:10,698 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:20:10,749 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:1	Num Templates:15	Num Examples with Template:24
2024-05-01 11:20:12,198 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 11:20:12,198 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:20:12,198 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_2/evaluation_runs.json
2024-05-01 11:20:12,198 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:20:12,206 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:20:12,270 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:2	Num Templates:15	Num Examples with Template:24
2024-05-01 11:20:13,737 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 11:20:13,737 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:20:13,737 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_3/evaluation_runs.json
2024-05-01 11:20:13,737 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:20:13,745 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:20:13,796 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:3	Num Templates:15	Num Examples with Template:24
2024-05-01 11:20:15,230 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 11:20:15,230 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:20:15,230 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_4/evaluation_runs.json
2024-05-01 11:20:15,230 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:20:15,238 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:20:15,289 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:4	Num Templates:15	Num Examples with Template:24
2024-05-01 11:20:16,730 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 11:20:16,730 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:20:16,730 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_5/evaluation_runs.json
2024-05-01 11:20:16,730 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:20:16,738 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:20:16,802 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:5	Num Templates:15	Num Examples with Template:24
2024-05-01 11:20:18,251 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 11:20:18,251 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:20:18,251 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_6/evaluation_runs.json
2024-05-01 11:20:18,251 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:20:18,258 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:20:18,309 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:6	Num Templates:15	Num Examples with Template:24
2024-05-01 11:20:19,747 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 11:20:19,748 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:20:19,748 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_7/evaluation_runs.json
2024-05-01 11:20:19,748 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:20:19,755 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:20:19,819 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:7	Num Templates:15	Num Examples with Template:24
2024-05-01 11:20:21,270 - root - [INFO] - 	!!!Scores: {'accuracy': 0.75, 'average': 0.75}
2024-05-01 11:20:21,271 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:20:21,271 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_8/evaluation_runs.json
2024-05-01 11:20:21,271 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:20:21,278 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:20:21,330 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:8	Num Templates:15	Num Examples with Template:24
2024-05-01 11:20:22,773 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 11:20:22,773 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:20:22,773 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_9/evaluation_runs.json
2024-05-01 11:20:22,773 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:20:22,781 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:20:22,832 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:9	Num Templates:15	Num Examples with Template:24
2024-05-01 11:20:24,278 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 11:20:24,278 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:20:24,278 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_10/evaluation_runs.json
2024-05-01 11:20:24,279 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:20:24,286 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:20:24,350 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:10	Num Templates:15	Num Examples with Template:24
2024-05-01 11:20:25,806 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 11:20:25,806 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:20:25,806 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_11/evaluation_runs.json
2024-05-01 11:20:25,807 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:20:25,814 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:20:25,865 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:11	Num Templates:15	Num Examples with Template:24
2024-05-01 11:20:27,309 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 11:20:27,309 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:20:27,309 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_12/evaluation_runs.json
2024-05-01 11:20:27,309 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:20:27,317 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:20:27,368 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:12	Num Templates:15	Num Examples with Template:24
2024-05-01 11:20:28,855 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 11:20:28,855 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:20:28,855 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_13/evaluation_runs.json
2024-05-01 11:20:28,855 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:20:28,863 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:20:28,914 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:13	Num Templates:15	Num Examples with Template:24
2024-05-01 11:20:30,358 - root - [INFO] - 	!!!Scores: {'accuracy': 0.75, 'average': 0.75}
2024-05-01 11:20:30,358 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:20:30,358 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_14/evaluation_runs.json
2024-05-01 11:20:30,358 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:20:30,366 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:20:30,430 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:14	Num Templates:15	Num Examples with Template:24
2024-05-01 11:20:31,910 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 11:20:31,910 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 11:20:31,910 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_0/evaluation_runs.json
2024-05-01 11:20:31,910 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:20:32,606 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-3ae7a9f52719d86a/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 11:20:32,660 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 11:20:36,248 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:0	Num Templates:5	Num Examples with Template:1235
2024-05-01 11:20:45,881 - root - [INFO] - 	!!!Scores: {'accuracy': 0.677, 'average': 0.677}
2024-05-01 11:20:45,881 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 11:20:45,881 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_1/evaluation_runs.json
2024-05-01 11:20:45,881 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:20:45,889 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 11:20:49,422 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:1	Num Templates:5	Num Examples with Template:1235
2024-05-01 11:20:59,136 - root - [INFO] - 	!!!Scores: {'accuracy': 0.668, 'average': 0.668}
2024-05-01 11:20:59,136 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 11:20:59,136 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_2/evaluation_runs.json
2024-05-01 11:20:59,137 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:20:59,144 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 11:21:02,723 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:2	Num Templates:5	Num Examples with Template:1235
2024-05-01 11:21:12,503 - root - [INFO] - 	!!!Scores: {'accuracy': 0.671, 'average': 0.671}
2024-05-01 11:21:12,504 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 11:21:12,504 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_3/evaluation_runs.json
2024-05-01 11:21:12,504 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:21:12,511 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 11:21:16,138 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:3	Num Templates:5	Num Examples with Template:1235
2024-05-01 11:21:26,242 - root - [INFO] - 	!!!Scores: {'accuracy': 0.667, 'average': 0.667}
2024-05-01 11:21:26,243 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 11:21:26,243 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_4/evaluation_runs.json
2024-05-01 11:21:26,243 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:21:26,251 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 11:21:29,843 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:4	Num Templates:5	Num Examples with Template:1235
2024-05-01 11:21:39,824 - root - [INFO] - 	!!!Scores: {'accuracy': 0.676, 'average': 0.676}
2024-05-01 11:21:39,824 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 11:21:39,824 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_0/evaluation_runs.json
2024-05-01 11:21:39,824 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:21:39,832 - root - [INFO] - 		Loading Full Data for wic
2024-05-01 11:21:40,507 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-8f888d9273347dcb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 11:21:40,559 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 11:21:42,046 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:0	Num Templates:10	Num Examples with Template:606
2024-05-01 11:21:47,368 - root - [INFO] - 	!!!Scores: {'accuracy': 0.686, 'average': 0.686}
2024-05-01 11:21:47,369 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 11:21:47,369 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_1/evaluation_runs.json
2024-05-01 11:21:47,369 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:21:47,377 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 11:21:48,865 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:1	Num Templates:10	Num Examples with Template:606
2024-05-01 11:21:54,051 - root - [INFO] - 	!!!Scores: {'accuracy': 0.683, 'average': 0.683}
2024-05-01 11:21:54,052 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 11:21:54,052 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_2/evaluation_runs.json
2024-05-01 11:21:54,052 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:21:54,060 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 11:21:55,554 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:2	Num Templates:10	Num Examples with Template:606
2024-05-01 11:22:01,260 - root - [INFO] - 	!!!Scores: {'accuracy': 0.682, 'average': 0.682}
2024-05-01 11:22:01,260 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 11:22:01,260 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_3/evaluation_runs.json
2024-05-01 11:22:01,260 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:22:01,268 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 11:22:02,757 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:3	Num Templates:10	Num Examples with Template:606
2024-05-01 11:22:08,588 - root - [INFO] - 	!!!Scores: {'accuracy': 0.599, 'average': 0.599}
2024-05-01 11:22:08,588 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 11:22:08,588 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_4/evaluation_runs.json
2024-05-01 11:22:08,588 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:22:08,596 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 11:22:10,076 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:4	Num Templates:10	Num Examples with Template:606
2024-05-01 11:22:15,485 - root - [INFO] - 	!!!Scores: {'accuracy': 0.594, 'average': 0.594}
2024-05-01 11:22:15,486 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 11:22:15,486 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_5/evaluation_runs.json
2024-05-01 11:22:15,486 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:22:15,493 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 11:22:16,987 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:5	Num Templates:10	Num Examples with Template:606
2024-05-01 11:22:22,822 - root - [INFO] - 	!!!Scores: {'accuracy': 0.7, 'average': 0.7}
2024-05-01 11:22:22,822 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 11:22:22,822 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_6/evaluation_runs.json
2024-05-01 11:22:22,822 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:22:22,830 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 11:22:24,318 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:6	Num Templates:10	Num Examples with Template:606
2024-05-01 11:22:29,732 - root - [INFO] - 	!!!Scores: {'accuracy': 0.686, 'average': 0.686}
2024-05-01 11:22:29,732 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 11:22:29,733 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_7/evaluation_runs.json
2024-05-01 11:22:29,733 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:22:29,740 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 11:22:31,216 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:7	Num Templates:10	Num Examples with Template:606
2024-05-01 11:22:36,779 - root - [INFO] - 	!!!Scores: {'accuracy': 0.566, 'average': 0.566}
2024-05-01 11:22:36,779 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 11:22:36,780 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_8/evaluation_runs.json
2024-05-01 11:22:36,780 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:22:36,787 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 11:22:38,282 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:8	Num Templates:10	Num Examples with Template:606
2024-05-01 11:22:44,370 - root - [INFO] - 	!!!Scores: {'accuracy': 0.662, 'average': 0.662}
2024-05-01 11:22:44,370 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 11:22:44,370 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_9/evaluation_runs.json
2024-05-01 11:22:44,370 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:22:44,379 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 11:22:45,853 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:9	Num Templates:10	Num Examples with Template:606
2024-05-01 11:22:50,586 - root - [INFO] - 	!!!Scores: {'accuracy': 0.665, 'average': 0.665}
2024-05-01 11:22:50,586 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 11:22:50,586 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_0/evaluation_runs.json
2024-05-01 11:22:50,586 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:22:50,594 - root - [INFO] - 		Loading Full Data for wsc
2024-05-01 11:22:51,298 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-4bcfdb1f33f6e278/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 11:22:51,317 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 11:22:51,506 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:0	Num Templates:10	Num Examples with Template:72
2024-05-01 11:22:53,298 - root - [INFO] - 	!!!Scores: {'accuracy': 0.597, 'average': 0.597}
2024-05-01 11:22:53,298 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 11:22:53,298 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_1/evaluation_runs.json
2024-05-01 11:22:53,298 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:22:53,306 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 11:22:53,478 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:1	Num Templates:10	Num Examples with Template:72
2024-05-01 11:22:55,238 - root - [INFO] - 	!!!Scores: {'accuracy': 0.542, 'average': 0.542}
2024-05-01 11:22:55,238 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 11:22:55,238 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_2/evaluation_runs.json
2024-05-01 11:22:55,238 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:22:55,246 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 11:22:55,462 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:2	Num Templates:10	Num Examples with Template:72
2024-05-01 11:22:57,385 - root - [INFO] - 	!!!Scores: {'accuracy': 0.694, 'average': 0.694}
2024-05-01 11:22:57,385 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 11:22:57,386 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_3/evaluation_runs.json
2024-05-01 11:22:57,386 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:22:57,394 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 11:22:57,609 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:3	Num Templates:10	Num Examples with Template:72
2024-05-01 11:22:59,537 - root - [INFO] - 	!!!Scores: {'accuracy': 0.667, 'average': 0.667}
2024-05-01 11:22:59,537 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 11:22:59,537 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_4/evaluation_runs.json
2024-05-01 11:22:59,537 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:22:59,545 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 11:22:59,727 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:4	Num Templates:10	Num Examples with Template:72
2024-05-01 11:23:01,482 - root - [INFO] - 	!!!Scores: {'accuracy': 0.556, 'average': 0.556}
2024-05-01 11:23:01,482 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 11:23:01,482 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_5/evaluation_runs.json
2024-05-01 11:23:01,482 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:23:01,490 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 11:23:01,665 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:5	Num Templates:10	Num Examples with Template:72
2024-05-01 11:23:03,465 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 11:23:03,465 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 11:23:03,465 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_6/evaluation_runs.json
2024-05-01 11:23:03,465 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:23:03,473 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 11:23:03,645 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:6	Num Templates:10	Num Examples with Template:72
2024-05-01 11:23:05,456 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 11:23:05,457 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 11:23:05,457 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_7/evaluation_runs.json
2024-05-01 11:23:05,457 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:23:05,464 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 11:23:05,737 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:7	Num Templates:10	Num Examples with Template:72
2024-05-01 11:23:07,506 - root - [INFO] - 	!!!Scores: {'accuracy': 0.514, 'average': 0.514}
2024-05-01 11:23:07,506 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 11:23:07,506 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_8/evaluation_runs.json
2024-05-01 11:23:07,506 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:23:07,514 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 11:23:07,687 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:8	Num Templates:10	Num Examples with Template:72
2024-05-01 11:23:09,484 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 11:23:09,485 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 11:23:09,485 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_9/evaluation_runs.json
2024-05-01 11:23:09,485 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:23:09,493 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 11:23:09,780 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:9	Num Templates:10	Num Examples with Template:72
2024-05-01 11:23:11,542 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 11:23:11,543 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 11:23:11,543 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_0/evaluation_runs.json
2024-05-01 11:23:11,543 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:23:11,551 - root - [INFO] - 		Loading Full Data for copa
2024-05-01 11:23:12,238 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-75ee279101665e7b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 11:23:12,253 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 11:23:12,470 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:0	Num Templates:8	Num Examples with Template:68
2024-05-01 11:23:14,058 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 11:23:14,058 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 11:23:14,058 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_1/evaluation_runs.json
2024-05-01 11:23:14,058 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:23:14,066 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 11:23:14,274 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:1	Num Templates:8	Num Examples with Template:68
2024-05-01 11:23:15,895 - root - [INFO] - 	!!!Scores: {'accuracy': 0.824, 'average': 0.824}
2024-05-01 11:23:15,895 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 11:23:15,895 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_2/evaluation_runs.json
2024-05-01 11:23:15,896 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:23:15,903 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 11:23:16,112 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:2	Num Templates:8	Num Examples with Template:68
2024-05-01 11:23:17,706 - root - [INFO] - 	!!!Scores: {'accuracy': 0.926, 'average': 0.926}
2024-05-01 11:23:17,707 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 11:23:17,707 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_3/evaluation_runs.json
2024-05-01 11:23:17,707 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:23:17,715 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 11:23:17,937 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:3	Num Templates:8	Num Examples with Template:68
2024-05-01 11:23:19,469 - root - [INFO] - 	!!!Scores: {'accuracy': 0.824, 'average': 0.824}
2024-05-01 11:23:19,469 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 11:23:19,469 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_4/evaluation_runs.json
2024-05-01 11:23:19,469 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:23:19,477 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 11:23:19,684 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:4	Num Templates:8	Num Examples with Template:68
2024-05-01 11:23:21,284 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 11:23:21,284 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 11:23:21,285 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_5/evaluation_runs.json
2024-05-01 11:23:21,285 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:23:21,292 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 11:23:21,502 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:5	Num Templates:8	Num Examples with Template:68
2024-05-01 11:23:23,108 - root - [INFO] - 	!!!Scores: {'accuracy': 0.882, 'average': 0.882}
2024-05-01 11:23:23,108 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 11:23:23,108 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_6/evaluation_runs.json
2024-05-01 11:23:23,108 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:23:23,116 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 11:23:23,324 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:6	Num Templates:8	Num Examples with Template:68
2024-05-01 11:23:24,884 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 11:23:24,885 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 11:23:24,885 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_7/evaluation_runs.json
2024-05-01 11:23:24,885 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:23:24,893 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 11:23:25,101 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:7	Num Templates:8	Num Examples with Template:68
2024-05-01 11:23:26,655 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 11:23:26,655 - root - [INFO] - 	Evaluating model on h-swag dataset
2024-05-01 11:23:26,655 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/h-swag_template_0/evaluation_runs.json
2024-05-01 11:23:26,655 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:23:26,668 - root - [INFO] - 		Loading Full Data for h-swag
2024-05-01 11:23:27,365 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-eac19eb96b26d7b6/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 11:23:28,174 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Selected Examples: 10010	Num Total Example:10010
2024-05-01 11:23:50,257 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Num Selected Example with Templates:10010	Template Idx:0	Num Templates:1	Num Examples with Template:10010
2024-05-01 11:29:03,637 - root - [INFO] - 	!!!Scores: {'accuracy': 0.424, 'average': 0.424}
2024-05-01 11:29:03,637 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 11:29:03,637 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_0/evaluation_runs.json
2024-05-01 11:29:03,637 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:29:04,349 - datasets.builder - [WARNING] - Found cached dataset arrow (/home/guodong/.cache/huggingface/datasets/arrow/default-2796e0ea53bfdf30/0.0.0/74f69db2c14c2860059d39860b1f400a03d11bf7fb5a8258ca38c501c878c137)
2024-05-01 11:29:04,462 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 11:29:10,405 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:0	Num Templates:5	Num Examples with Template:1839
2024-05-01 11:29:34,714 - root - [INFO] - 	!!!Scores: {'accuracy': 0.918, 'average': 0.918}
2024-05-01 11:29:34,714 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 11:29:34,714 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_1/evaluation_runs.json
2024-05-01 11:29:34,714 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:29:34,722 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 11:29:40,732 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:1	Num Templates:5	Num Examples with Template:1839
2024-05-01 11:30:05,469 - root - [INFO] - 	!!!Scores: {'accuracy': 0.917, 'average': 0.917}
2024-05-01 11:30:05,469 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 11:30:05,469 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_2/evaluation_runs.json
2024-05-01 11:30:05,469 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:30:05,478 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 11:30:11,463 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:2	Num Templates:5	Num Examples with Template:1839
2024-05-01 11:30:36,041 - root - [INFO] - 	!!!Scores: {'accuracy': 0.923, 'average': 0.923}
2024-05-01 11:30:36,041 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 11:30:36,041 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_3/evaluation_runs.json
2024-05-01 11:30:36,041 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:30:36,049 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 11:30:42,079 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:3	Num Templates:5	Num Examples with Template:1839
2024-05-01 11:31:06,702 - root - [INFO] - 	!!!Scores: {'accuracy': 0.914, 'average': 0.914}
2024-05-01 11:31:06,702 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 11:31:06,702 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_4/evaluation_runs.json
2024-05-01 11:31:06,702 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:31:06,711 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 11:31:12,721 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:4	Num Templates:5	Num Examples with Template:1839
2024-05-01 11:31:38,012 - root - [INFO] - 	!!!Scores: {'accuracy': 0.915, 'average': 0.915}
2024-05-01 11:31:38,012 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:31:38,012 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_0/evaluation_runs.json
2024-05-01 11:31:38,012 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:31:38,700 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-9d580cad42f0d988/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 11:31:38,754 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:31:40,553 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:31:59,036 - root - [INFO] - 	!!!Scores: {'accuracy': 0.613, 'average': 0.613}
2024-05-01 11:31:59,036 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:31:59,036 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_1/evaluation_runs.json
2024-05-01 11:31:59,036 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:31:59,044 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:32:00,883 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:32:17,432 - root - [INFO] - 	!!!Scores: {'accuracy': 0.671, 'average': 0.671}
2024-05-01 11:32:17,432 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:32:17,432 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_2/evaluation_runs.json
2024-05-01 11:32:17,432 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:32:17,441 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:32:19,253 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:32:35,509 - root - [INFO] - 	!!!Scores: {'accuracy': 0.682, 'average': 0.682}
2024-05-01 11:32:35,509 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:32:35,509 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_3/evaluation_runs.json
2024-05-01 11:32:35,509 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:32:35,517 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:32:37,322 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:32:53,747 - root - [INFO] - 	!!!Scores: {'accuracy': 0.66, 'average': 0.66}
2024-05-01 11:32:53,747 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:32:53,747 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_4/evaluation_runs.json
2024-05-01 11:32:53,748 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:32:53,755 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:32:55,550 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:33:12,330 - root - [INFO] - 	!!!Scores: {'accuracy': 0.679, 'average': 0.679}
2024-05-01 11:33:12,330 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:33:12,330 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_5/evaluation_runs.json
2024-05-01 11:33:12,330 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:33:12,338 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:33:14,156 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:33:30,624 - root - [INFO] - 	!!!Scores: {'accuracy': 0.665, 'average': 0.665}
2024-05-01 11:33:30,624 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:33:30,624 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_6/evaluation_runs.json
2024-05-01 11:33:30,625 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:33:30,634 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:33:33,013 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:33:51,144 - root - [INFO] - 	!!!Scores: {'accuracy': 0.651, 'average': 0.651}
2024-05-01 11:33:51,144 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:33:51,144 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_7/evaluation_runs.json
2024-05-01 11:33:51,144 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:33:51,152 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:33:52,989 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:34:09,687 - root - [INFO] - 	!!!Scores: {'accuracy': 0.663, 'average': 0.663}
2024-05-01 11:34:09,687 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:34:09,687 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_8/evaluation_runs.json
2024-05-01 11:34:09,688 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:34:09,695 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:34:11,527 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:34:28,051 - root - [INFO] - 	!!!Scores: {'accuracy': 0.662, 'average': 0.662}
2024-05-01 11:34:28,051 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:34:28,051 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_9/evaluation_runs.json
2024-05-01 11:34:28,051 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:34:28,059 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:34:30,438 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:34:47,606 - root - [INFO] - 	!!!Scores: {'accuracy': 0.624, 'average': 0.624}
2024-05-01 11:34:47,607 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:34:47,607 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_10/evaluation_runs.json
2024-05-01 11:34:47,607 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:34:47,616 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:34:49,987 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:35:06,836 - root - [INFO] - 	!!!Scores: {'accuracy': 0.654, 'average': 0.654}
2024-05-01 11:35:06,836 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:35:06,836 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_11/evaluation_runs.json
2024-05-01 11:35:06,836 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:35:06,844 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:35:08,834 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:35:25,360 - root - [INFO] - 	!!!Scores: {'accuracy': 0.677, 'average': 0.677}
2024-05-01 11:35:25,361 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:35:25,361 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_12/evaluation_runs.json
2024-05-01 11:35:25,361 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:35:25,368 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:35:27,721 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:35:44,596 - root - [INFO] - 	!!!Scores: {'accuracy': 0.622, 'average': 0.622}
2024-05-01 11:35:44,596 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:35:44,596 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_13/evaluation_runs.json
2024-05-01 11:35:44,596 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:35:44,604 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:35:46,960 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:36:04,710 - root - [INFO] - 	!!!Scores: {'accuracy': 0.65, 'average': 0.65}
2024-05-01 11:36:04,710 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 11:36:04,710 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_14/evaluation_runs.json
2024-05-01 11:36:04,710 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:36:04,719 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:36:06,549 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:36:23,074 - root - [INFO] - 	!!!Scores: {'accuracy': 0.666, 'average': 0.666}
2024-05-01 11:36:23,074 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:36:23,074 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_0/evaluation_runs.json
2024-05-01 11:36:23,075 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:36:23,991 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-2f5e309763d7971f/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 11:36:24,043 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:36:25,840 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:36:43,770 - root - [INFO] - 	!!!Scores: {'accuracy': 0.494, 'average': 0.494}
2024-05-01 11:36:43,771 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:36:43,771 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_1/evaluation_runs.json
2024-05-01 11:36:43,771 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:36:43,779 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:36:45,613 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:37:01,864 - root - [INFO] - 	!!!Scores: {'accuracy': 0.517, 'average': 0.517}
2024-05-01 11:37:01,864 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:37:01,864 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_2/evaluation_runs.json
2024-05-01 11:37:01,865 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:37:01,873 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:37:03,685 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:37:19,545 - root - [INFO] - 	!!!Scores: {'accuracy': 0.521, 'average': 0.521}
2024-05-01 11:37:19,545 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:37:19,545 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_3/evaluation_runs.json
2024-05-01 11:37:19,545 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:37:19,553 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:37:21,360 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:37:37,413 - root - [INFO] - 	!!!Scores: {'accuracy': 0.498, 'average': 0.498}
2024-05-01 11:37:37,413 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:37:37,413 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_4/evaluation_runs.json
2024-05-01 11:37:37,413 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:37:37,421 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:37:39,211 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:37:55,683 - root - [INFO] - 	!!!Scores: {'accuracy': 0.52, 'average': 0.52}
2024-05-01 11:37:55,683 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:37:55,683 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_5/evaluation_runs.json
2024-05-01 11:37:55,683 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:37:55,691 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:37:57,512 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:38:13,635 - root - [INFO] - 	!!!Scores: {'accuracy': 0.514, 'average': 0.514}
2024-05-01 11:38:13,635 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:38:13,635 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_6/evaluation_runs.json
2024-05-01 11:38:13,635 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:38:13,644 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:38:16,019 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:38:33,677 - root - [INFO] - 	!!!Scores: {'accuracy': 0.501, 'average': 0.501}
2024-05-01 11:38:33,678 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:38:33,678 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_7/evaluation_runs.json
2024-05-01 11:38:33,678 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:38:33,686 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:38:35,520 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:38:51,917 - root - [INFO] - 	!!!Scores: {'accuracy': 0.504, 'average': 0.504}
2024-05-01 11:38:51,918 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:38:51,918 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_8/evaluation_runs.json
2024-05-01 11:38:51,918 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:38:51,925 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:38:53,757 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:39:10,008 - root - [INFO] - 	!!!Scores: {'accuracy': 0.513, 'average': 0.513}
2024-05-01 11:39:10,008 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:39:10,008 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_9/evaluation_runs.json
2024-05-01 11:39:10,008 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:39:10,016 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:39:12,386 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:39:29,250 - root - [INFO] - 	!!!Scores: {'accuracy': 0.494, 'average': 0.494}
2024-05-01 11:39:29,250 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:39:29,250 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_10/evaluation_runs.json
2024-05-01 11:39:29,250 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:39:29,259 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:39:31,621 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:39:48,196 - root - [INFO] - 	!!!Scores: {'accuracy': 0.503, 'average': 0.503}
2024-05-01 11:39:48,196 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:39:48,196 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_11/evaluation_runs.json
2024-05-01 11:39:48,196 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:39:48,204 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:39:50,011 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:40:06,263 - root - [INFO] - 	!!!Scores: {'accuracy': 0.518, 'average': 0.518}
2024-05-01 11:40:06,263 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:40:06,263 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_12/evaluation_runs.json
2024-05-01 11:40:06,263 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:40:06,271 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:40:08,630 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:40:25,184 - root - [INFO] - 	!!!Scores: {'accuracy': 0.494, 'average': 0.494}
2024-05-01 11:40:25,184 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:40:25,184 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_13/evaluation_runs.json
2024-05-01 11:40:25,184 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:40:25,193 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:40:27,548 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:40:44,932 - root - [INFO] - 	!!!Scores: {'accuracy': 0.501, 'average': 0.501}
2024-05-01 11:40:44,932 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 11:40:44,932 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_14/evaluation_runs.json
2024-05-01 11:40:44,933 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:40:44,941 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 11:40:46,776 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 11:41:03,031 - root - [INFO] - 	!!!Scores: {'accuracy': 0.513, 'average': 0.513}
2024-05-01 11:41:03,031 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:41:03,031 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_0/evaluation_runs.json
2024-05-01 11:41:03,031 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:41:03,949 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-5854c73e8e2a58bf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 11:41:04,009 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:41:06,165 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:0	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:41:31,475 - root - [INFO] - 	!!!Scores: {'accuracy': 0.491, 'average': 0.491}
2024-05-01 11:41:31,475 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:41:31,478 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_1/evaluation_runs.json
2024-05-01 11:41:31,478 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:41:31,486 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:41:33,688 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:1	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:41:56,557 - root - [INFO] - 	!!!Scores: {'accuracy': 0.501, 'average': 0.501}
2024-05-01 11:41:56,557 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:41:56,557 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_2/evaluation_runs.json
2024-05-01 11:41:56,557 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:41:56,567 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:41:58,745 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:2	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:42:21,229 - root - [INFO] - 	!!!Scores: {'accuracy': 0.485, 'average': 0.485}
2024-05-01 11:42:21,229 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:42:21,229 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_3/evaluation_runs.json
2024-05-01 11:42:21,229 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:42:21,237 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:42:23,389 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:3	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:42:46,077 - root - [INFO] - 	!!!Scores: {'accuracy': 0.495, 'average': 0.495}
2024-05-01 11:42:46,077 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:42:46,077 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_4/evaluation_runs.json
2024-05-01 11:42:46,077 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:42:46,085 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:42:48,234 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:4	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:43:11,485 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 11:43:11,485 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:43:11,485 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_5/evaluation_runs.json
2024-05-01 11:43:11,485 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:43:11,494 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:43:13,666 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:5	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:43:36,424 - root - [INFO] - 	!!!Scores: {'accuracy': 0.494, 'average': 0.494}
2024-05-01 11:43:36,424 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:43:36,424 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_6/evaluation_runs.json
2024-05-01 11:43:36,425 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:43:36,432 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:43:39,271 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:6	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:44:04,170 - root - [INFO] - 	!!!Scores: {'accuracy': 0.486, 'average': 0.486}
2024-05-01 11:44:04,170 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:44:04,170 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_7/evaluation_runs.json
2024-05-01 11:44:04,170 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:44:04,180 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:44:06,383 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:7	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:44:29,694 - root - [INFO] - 	!!!Scores: {'accuracy': 0.495, 'average': 0.495}
2024-05-01 11:44:29,694 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:44:29,694 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_8/evaluation_runs.json
2024-05-01 11:44:29,694 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:44:29,702 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:44:31,911 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:8	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:44:54,760 - root - [INFO] - 	!!!Scores: {'accuracy': 0.487, 'average': 0.487}
2024-05-01 11:44:54,760 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:44:54,760 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_9/evaluation_runs.json
2024-05-01 11:44:54,761 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:44:54,768 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:44:57,612 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:9	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:45:21,378 - root - [INFO] - 	!!!Scores: {'accuracy': 0.482, 'average': 0.482}
2024-05-01 11:45:21,378 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:45:21,378 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_10/evaluation_runs.json
2024-05-01 11:45:21,379 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:45:21,388 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:45:24,224 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:10	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:45:47,598 - root - [INFO] - 	!!!Scores: {'accuracy': 0.491, 'average': 0.491}
2024-05-01 11:45:47,599 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:45:47,599 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_11/evaluation_runs.json
2024-05-01 11:45:47,599 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:45:47,607 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:45:49,766 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:11	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:46:12,628 - root - [INFO] - 	!!!Scores: {'accuracy': 0.483, 'average': 0.483}
2024-05-01 11:46:12,628 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:46:12,628 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_12/evaluation_runs.json
2024-05-01 11:46:12,628 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:46:12,636 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:46:15,458 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:12	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:46:38,845 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 11:46:38,845 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:46:38,845 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_13/evaluation_runs.json
2024-05-01 11:46:38,845 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:46:38,854 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:46:41,685 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:13	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:47:06,097 - root - [INFO] - 	!!!Scores: {'accuracy': 0.489, 'average': 0.489}
2024-05-01 11:47:06,098 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 11:47:06,098 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_14/evaluation_runs.json
2024-05-01 11:47:06,098 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:47:06,106 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 11:47:08,320 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:14	Num Templates:15	Num Examples with Template:1200
2024-05-01 11:47:31,307 - root - [INFO] - 	!!!Scores: {'accuracy': 0.492, 'average': 0.492}
2024-05-01 11:47:31,383 - root - [INFO] - Unexpected keys: []
2024-05-01 11:47:31,616 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 11:47:31,616 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_0/evaluation_runs.json
2024-05-01 11:47:31,616 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:47:31,624 - root - [INFO] - 		Loading Full Data for rte
2024-05-01 11:47:32,527 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-667c257dd17a5fbb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 11:47:32,550 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 11:47:33,066 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:0	Num Templates:10	Num Examples with Template:245
2024-05-01 11:47:38,850 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 11:47:38,850 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 11:47:38,850 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_1/evaluation_runs.json
2024-05-01 11:47:38,851 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:47:38,859 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 11:47:39,376 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:1	Num Templates:10	Num Examples with Template:245
2024-05-01 11:47:44,856 - root - [INFO] - 	!!!Scores: {'accuracy': 0.788, 'average': 0.788}
2024-05-01 11:47:44,856 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 11:47:44,857 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_2/evaluation_runs.json
2024-05-01 11:47:44,857 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:47:44,865 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 11:47:45,384 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:2	Num Templates:10	Num Examples with Template:245
2024-05-01 11:47:50,852 - root - [INFO] - 	!!!Scores: {'accuracy': 0.82, 'average': 0.82}
2024-05-01 11:47:50,853 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 11:47:50,853 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_3/evaluation_runs.json
2024-05-01 11:47:50,853 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:47:50,861 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 11:47:51,373 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:3	Num Templates:10	Num Examples with Template:245
2024-05-01 11:47:56,753 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 11:47:56,753 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 11:47:56,753 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_4/evaluation_runs.json
2024-05-01 11:47:56,753 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:47:56,761 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 11:47:57,275 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:4	Num Templates:10	Num Examples with Template:245
2024-05-01 11:48:02,730 - root - [INFO] - 	!!!Scores: {'accuracy': 0.8, 'average': 0.8}
2024-05-01 11:48:02,730 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 11:48:02,730 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_5/evaluation_runs.json
2024-05-01 11:48:02,730 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:48:02,738 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 11:48:03,254 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:5	Num Templates:10	Num Examples with Template:245
2024-05-01 11:48:08,706 - root - [INFO] - 	!!!Scores: {'accuracy': 0.804, 'average': 0.804}
2024-05-01 11:48:08,707 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 11:48:08,707 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_6/evaluation_runs.json
2024-05-01 11:48:08,707 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:48:08,714 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 11:48:09,231 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:6	Num Templates:10	Num Examples with Template:245
2024-05-01 11:48:14,618 - root - [INFO] - 	!!!Scores: {'accuracy': 0.8, 'average': 0.8}
2024-05-01 11:48:14,618 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 11:48:14,618 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_7/evaluation_runs.json
2024-05-01 11:48:14,619 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:48:14,626 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 11:48:15,138 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:7	Num Templates:10	Num Examples with Template:245
2024-05-01 11:48:20,723 - root - [INFO] - 	!!!Scores: {'accuracy': 0.788, 'average': 0.788}
2024-05-01 11:48:20,723 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 11:48:20,723 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_8/evaluation_runs.json
2024-05-01 11:48:20,723 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:48:20,731 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 11:48:21,243 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:8	Num Templates:10	Num Examples with Template:245
2024-05-01 11:48:26,688 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 11:48:26,689 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 11:48:26,689 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_9/evaluation_runs.json
2024-05-01 11:48:26,689 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:48:26,696 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 11:48:27,214 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:9	Num Templates:10	Num Examples with Template:245
2024-05-01 11:48:32,756 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 11:48:32,756 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:48:32,756 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_0/evaluation_runs.json
2024-05-01 11:48:32,756 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:48:32,764 - root - [INFO] - 		Loading Full Data for cb
2024-05-01 11:48:33,665 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-c26e73467e3f215e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 11:48:33,674 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:48:33,736 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:0	Num Templates:15	Num Examples with Template:24
2024-05-01 11:48:35,179 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 11:48:35,180 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:48:35,180 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_1/evaluation_runs.json
2024-05-01 11:48:35,180 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:48:35,187 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:48:35,238 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:1	Num Templates:15	Num Examples with Template:24
2024-05-01 11:48:36,685 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 11:48:36,685 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:48:36,685 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_2/evaluation_runs.json
2024-05-01 11:48:36,685 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:48:36,693 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:48:36,757 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:2	Num Templates:15	Num Examples with Template:24
2024-05-01 11:48:38,226 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 11:48:38,226 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:48:38,226 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_3/evaluation_runs.json
2024-05-01 11:48:38,226 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:48:38,234 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:48:38,285 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:3	Num Templates:15	Num Examples with Template:24
2024-05-01 11:48:39,719 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 11:48:39,719 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:48:39,719 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_4/evaluation_runs.json
2024-05-01 11:48:39,719 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:48:39,726 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:48:39,777 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:4	Num Templates:15	Num Examples with Template:24
2024-05-01 11:48:41,219 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 11:48:41,219 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:48:41,219 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_5/evaluation_runs.json
2024-05-01 11:48:41,219 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:48:41,227 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:48:41,291 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:5	Num Templates:15	Num Examples with Template:24
2024-05-01 11:48:42,741 - root - [INFO] - 	!!!Scores: {'accuracy': 0.917, 'average': 0.917}
2024-05-01 11:48:42,741 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:48:42,741 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_6/evaluation_runs.json
2024-05-01 11:48:42,741 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:48:42,749 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:48:42,799 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:6	Num Templates:15	Num Examples with Template:24
2024-05-01 11:48:44,238 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 11:48:44,238 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:48:44,238 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_7/evaluation_runs.json
2024-05-01 11:48:44,238 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:48:44,246 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:48:44,309 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:7	Num Templates:15	Num Examples with Template:24
2024-05-01 11:48:45,761 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 11:48:45,762 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:48:45,762 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_8/evaluation_runs.json
2024-05-01 11:48:45,762 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:48:45,769 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:48:45,821 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:8	Num Templates:15	Num Examples with Template:24
2024-05-01 11:48:47,266 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 11:48:47,266 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:48:47,266 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_9/evaluation_runs.json
2024-05-01 11:48:47,267 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:48:47,274 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:48:47,325 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:9	Num Templates:15	Num Examples with Template:24
2024-05-01 11:48:48,772 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 11:48:48,773 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:48:48,773 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_10/evaluation_runs.json
2024-05-01 11:48:48,773 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:48:48,780 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:48:48,845 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:10	Num Templates:15	Num Examples with Template:24
2024-05-01 11:48:50,301 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 11:48:50,301 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:48:50,301 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_11/evaluation_runs.json
2024-05-01 11:48:50,301 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:48:50,309 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:48:50,359 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:11	Num Templates:15	Num Examples with Template:24
2024-05-01 11:48:51,804 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 11:48:51,804 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:48:51,804 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_12/evaluation_runs.json
2024-05-01 11:48:51,804 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:48:51,812 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:48:51,863 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:12	Num Templates:15	Num Examples with Template:24
2024-05-01 11:48:53,349 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 11:48:53,350 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:48:53,350 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_13/evaluation_runs.json
2024-05-01 11:48:53,350 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:48:53,357 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:48:53,408 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:13	Num Templates:15	Num Examples with Template:24
2024-05-01 11:48:54,851 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 11:48:54,852 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 11:48:54,852 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_14/evaluation_runs.json
2024-05-01 11:48:54,852 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:48:54,859 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 11:48:54,923 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:14	Num Templates:15	Num Examples with Template:24
2024-05-01 11:48:56,403 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 11:48:56,403 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 11:48:56,403 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_0/evaluation_runs.json
2024-05-01 11:48:56,403 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:48:57,306 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-3ae7a9f52719d86a/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 11:48:57,363 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 11:49:00,953 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:0	Num Templates:5	Num Examples with Template:1235
2024-05-01 11:49:10,581 - root - [INFO] - 	!!!Scores: {'accuracy': 0.679, 'average': 0.679}
2024-05-01 11:49:10,581 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 11:49:10,581 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_1/evaluation_runs.json
2024-05-01 11:49:10,581 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:49:10,589 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 11:49:14,227 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:1	Num Templates:5	Num Examples with Template:1235
2024-05-01 11:49:23,942 - root - [INFO] - 	!!!Scores: {'accuracy': 0.668, 'average': 0.668}
2024-05-01 11:49:23,942 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 11:49:23,942 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_2/evaluation_runs.json
2024-05-01 11:49:23,943 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:49:23,950 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 11:49:27,540 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:2	Num Templates:5	Num Examples with Template:1235
2024-05-01 11:49:37,321 - root - [INFO] - 	!!!Scores: {'accuracy': 0.683, 'average': 0.683}
2024-05-01 11:49:37,321 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 11:49:37,321 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_3/evaluation_runs.json
2024-05-01 11:49:37,321 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:49:37,330 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 11:49:40,967 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:3	Num Templates:5	Num Examples with Template:1235
2024-05-01 11:49:51,077 - root - [INFO] - 	!!!Scores: {'accuracy': 0.677, 'average': 0.677}
2024-05-01 11:49:51,077 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 11:49:51,077 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_4/evaluation_runs.json
2024-05-01 11:49:51,077 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:49:51,085 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 11:49:54,690 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:4	Num Templates:5	Num Examples with Template:1235
2024-05-01 11:50:04,664 - root - [INFO] - 	!!!Scores: {'accuracy': 0.677, 'average': 0.677}
2024-05-01 11:50:04,664 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 11:50:04,664 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_0/evaluation_runs.json
2024-05-01 11:50:04,664 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:50:04,672 - root - [INFO] - 		Loading Full Data for wic
2024-05-01 11:50:05,598 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-8f888d9273347dcb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 11:50:05,650 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 11:50:07,139 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:0	Num Templates:10	Num Examples with Template:606
2024-05-01 11:50:12,461 - root - [INFO] - 	!!!Scores: {'accuracy': 0.649, 'average': 0.649}
2024-05-01 11:50:12,461 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 11:50:12,461 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_1/evaluation_runs.json
2024-05-01 11:50:12,461 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:50:12,469 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 11:50:13,958 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:1	Num Templates:10	Num Examples with Template:606
2024-05-01 11:50:19,145 - root - [INFO] - 	!!!Scores: {'accuracy': 0.658, 'average': 0.658}
2024-05-01 11:50:19,145 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 11:50:19,145 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_2/evaluation_runs.json
2024-05-01 11:50:19,145 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:50:19,153 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 11:50:20,649 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:2	Num Templates:10	Num Examples with Template:606
2024-05-01 11:50:26,350 - root - [INFO] - 	!!!Scores: {'accuracy': 0.645, 'average': 0.645}
2024-05-01 11:50:26,350 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 11:50:26,350 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_3/evaluation_runs.json
2024-05-01 11:50:26,350 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:50:26,358 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 11:50:27,849 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:3	Num Templates:10	Num Examples with Template:606
2024-05-01 11:50:33,677 - root - [INFO] - 	!!!Scores: {'accuracy': 0.538, 'average': 0.538}
2024-05-01 11:50:33,677 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 11:50:33,678 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_4/evaluation_runs.json
2024-05-01 11:50:33,678 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:50:33,685 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 11:50:35,178 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:4	Num Templates:10	Num Examples with Template:606
2024-05-01 11:50:40,587 - root - [INFO] - 	!!!Scores: {'accuracy': 0.541, 'average': 0.541}
2024-05-01 11:50:40,587 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 11:50:40,587 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_5/evaluation_runs.json
2024-05-01 11:50:40,588 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:50:40,595 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 11:50:42,089 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:5	Num Templates:10	Num Examples with Template:606
2024-05-01 11:50:47,927 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 11:50:47,928 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 11:50:47,928 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_6/evaluation_runs.json
2024-05-01 11:50:47,928 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:50:47,936 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 11:50:49,432 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:6	Num Templates:10	Num Examples with Template:606
2024-05-01 11:50:54,856 - root - [INFO] - 	!!!Scores: {'accuracy': 0.665, 'average': 0.665}
2024-05-01 11:50:54,856 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 11:50:54,856 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_7/evaluation_runs.json
2024-05-01 11:50:54,856 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:50:54,865 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 11:50:56,340 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:7	Num Templates:10	Num Examples with Template:606
2024-05-01 11:51:01,916 - root - [INFO] - 	!!!Scores: {'accuracy': 0.523, 'average': 0.523}
2024-05-01 11:51:01,916 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 11:51:01,916 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_8/evaluation_runs.json
2024-05-01 11:51:01,917 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:51:01,925 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 11:51:03,428 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:8	Num Templates:10	Num Examples with Template:606
2024-05-01 11:51:09,513 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 11:51:09,513 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 11:51:09,513 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_9/evaluation_runs.json
2024-05-01 11:51:09,513 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:51:09,521 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 11:51:11,002 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:9	Num Templates:10	Num Examples with Template:606
2024-05-01 11:51:15,729 - root - [INFO] - 	!!!Scores: {'accuracy': 0.649, 'average': 0.649}
2024-05-01 11:51:15,730 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 11:51:15,730 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_0/evaluation_runs.json
2024-05-01 11:51:15,730 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:51:15,737 - root - [INFO] - 		Loading Full Data for wsc
2024-05-01 11:51:16,658 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-4bcfdb1f33f6e278/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 11:51:16,676 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 11:51:16,864 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:0	Num Templates:10	Num Examples with Template:72
2024-05-01 11:51:18,656 - root - [INFO] - 	!!!Scores: {'accuracy': 0.597, 'average': 0.597}
2024-05-01 11:51:18,656 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 11:51:18,656 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_1/evaluation_runs.json
2024-05-01 11:51:18,656 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:51:18,664 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 11:51:18,835 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:1	Num Templates:10	Num Examples with Template:72
2024-05-01 11:51:20,596 - root - [INFO] - 	!!!Scores: {'accuracy': 0.569, 'average': 0.569}
2024-05-01 11:51:20,596 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 11:51:20,596 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_2/evaluation_runs.json
2024-05-01 11:51:20,596 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:51:20,604 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 11:51:20,820 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:2	Num Templates:10	Num Examples with Template:72
2024-05-01 11:51:22,747 - root - [INFO] - 	!!!Scores: {'accuracy': 0.708, 'average': 0.708}
2024-05-01 11:51:22,747 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 11:51:22,747 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_3/evaluation_runs.json
2024-05-01 11:51:22,747 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:51:22,754 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 11:51:22,970 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:3	Num Templates:10	Num Examples with Template:72
2024-05-01 11:51:24,898 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 11:51:24,898 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 11:51:24,898 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_4/evaluation_runs.json
2024-05-01 11:51:24,898 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:51:24,905 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 11:51:25,088 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:4	Num Templates:10	Num Examples with Template:72
2024-05-01 11:51:26,838 - root - [INFO] - 	!!!Scores: {'accuracy': 0.569, 'average': 0.569}
2024-05-01 11:51:26,838 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 11:51:26,838 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_5/evaluation_runs.json
2024-05-01 11:51:26,839 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:51:26,846 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 11:51:27,021 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:5	Num Templates:10	Num Examples with Template:72
2024-05-01 11:51:28,820 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 11:51:28,820 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 11:51:28,820 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_6/evaluation_runs.json
2024-05-01 11:51:28,821 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:51:28,828 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 11:51:29,000 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:6	Num Templates:10	Num Examples with Template:72
2024-05-01 11:51:30,812 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 11:51:30,812 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 11:51:30,812 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_7/evaluation_runs.json
2024-05-01 11:51:30,812 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:51:30,820 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 11:51:31,093 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:7	Num Templates:10	Num Examples with Template:72
2024-05-01 11:51:32,864 - root - [INFO] - 	!!!Scores: {'accuracy': 0.528, 'average': 0.528}
2024-05-01 11:51:32,864 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 11:51:32,864 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_8/evaluation_runs.json
2024-05-01 11:51:32,864 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:51:32,872 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 11:51:33,045 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:8	Num Templates:10	Num Examples with Template:72
2024-05-01 11:51:34,844 - root - [INFO] - 	!!!Scores: {'accuracy': 0.611, 'average': 0.611}
2024-05-01 11:51:34,844 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 11:51:34,844 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_9/evaluation_runs.json
2024-05-01 11:51:34,844 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:51:34,852 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 11:51:35,140 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:9	Num Templates:10	Num Examples with Template:72
2024-05-01 11:51:36,902 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 11:51:36,902 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 11:51:36,902 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_0/evaluation_runs.json
2024-05-01 11:51:36,902 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:51:36,910 - root - [INFO] - 		Loading Full Data for copa
2024-05-01 11:51:37,809 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-75ee279101665e7b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 11:51:37,825 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 11:51:38,042 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:0	Num Templates:8	Num Examples with Template:68
2024-05-01 11:51:39,628 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 11:51:39,628 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 11:51:39,628 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_1/evaluation_runs.json
2024-05-01 11:51:39,628 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:51:39,636 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 11:51:39,844 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:1	Num Templates:8	Num Examples with Template:68
2024-05-01 11:51:41,462 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 11:51:41,462 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 11:51:41,462 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_2/evaluation_runs.json
2024-05-01 11:51:41,463 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:51:41,470 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 11:51:41,678 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:2	Num Templates:8	Num Examples with Template:68
2024-05-01 11:51:43,274 - root - [INFO] - 	!!!Scores: {'accuracy': 0.926, 'average': 0.926}
2024-05-01 11:51:43,274 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 11:51:43,275 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_3/evaluation_runs.json
2024-05-01 11:51:43,275 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:51:43,282 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 11:51:43,505 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:3	Num Templates:8	Num Examples with Template:68
2024-05-01 11:51:45,038 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 11:51:45,038 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 11:51:45,038 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_4/evaluation_runs.json
2024-05-01 11:51:45,038 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:51:45,046 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 11:51:45,253 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:4	Num Templates:8	Num Examples with Template:68
2024-05-01 11:51:46,852 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 11:51:46,852 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 11:51:46,852 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_5/evaluation_runs.json
2024-05-01 11:51:46,852 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:51:46,860 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 11:51:47,068 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:5	Num Templates:8	Num Examples with Template:68
2024-05-01 11:51:48,675 - root - [INFO] - 	!!!Scores: {'accuracy': 0.868, 'average': 0.868}
2024-05-01 11:51:48,675 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 11:51:48,675 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_6/evaluation_runs.json
2024-05-01 11:51:48,675 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:51:48,683 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 11:51:48,890 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:6	Num Templates:8	Num Examples with Template:68
2024-05-01 11:51:50,451 - root - [INFO] - 	!!!Scores: {'accuracy': 0.853, 'average': 0.853}
2024-05-01 11:51:50,451 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 11:51:50,451 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_7/evaluation_runs.json
2024-05-01 11:51:50,452 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:51:50,459 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 11:51:50,666 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:7	Num Templates:8	Num Examples with Template:68
2024-05-01 11:51:52,222 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 11:51:52,222 - root - [INFO] - 	Evaluating model on h-swag dataset
2024-05-01 11:51:52,222 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/h-swag_template_0/evaluation_runs.json
2024-05-01 11:51:52,222 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:51:52,235 - root - [INFO] - 		Loading Full Data for h-swag
2024-05-01 11:51:53,132 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-eac19eb96b26d7b6/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 11:51:53,823 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Selected Examples: 10010	Num Total Example:10010
2024-05-01 11:52:15,777 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Num Selected Example with Templates:10010	Template Idx:0	Num Templates:1	Num Examples with Template:10010
2024-05-01 11:57:29,622 - root - [INFO] - 	!!!Scores: {'accuracy': 0.426, 'average': 0.426}
2024-05-01 11:57:29,623 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 11:57:29,623 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_0/evaluation_runs.json
2024-05-01 11:57:29,623 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:57:30,539 - datasets.builder - [WARNING] - Found cached dataset arrow (/home/guodong/.cache/huggingface/datasets/arrow/default-2796e0ea53bfdf30/0.0.0/74f69db2c14c2860059d39860b1f400a03d11bf7fb5a8258ca38c501c878c137)
2024-05-01 11:57:30,651 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 11:57:36,650 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:0	Num Templates:5	Num Examples with Template:1839
2024-05-01 11:58:01,111 - root - [INFO] - 	!!!Scores: {'accuracy': 0.927, 'average': 0.927}
2024-05-01 11:58:01,112 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 11:58:01,112 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_1/evaluation_runs.json
2024-05-01 11:58:01,112 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:58:01,121 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 11:58:07,200 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:1	Num Templates:5	Num Examples with Template:1839
2024-05-01 11:58:32,152 - root - [INFO] - 	!!!Scores: {'accuracy': 0.928, 'average': 0.928}
2024-05-01 11:58:32,152 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 11:58:32,152 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_2/evaluation_runs.json
2024-05-01 11:58:32,152 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:58:32,161 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 11:58:38,170 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:2	Num Templates:5	Num Examples with Template:1839
2024-05-01 11:59:02,741 - root - [INFO] - 	!!!Scores: {'accuracy': 0.929, 'average': 0.929}
2024-05-01 11:59:02,741 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 11:59:02,742 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_3/evaluation_runs.json
2024-05-01 11:59:02,742 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:59:02,750 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 11:59:08,785 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:3	Num Templates:5	Num Examples with Template:1839
2024-05-01 11:59:33,406 - root - [INFO] - 	!!!Scores: {'accuracy': 0.922, 'average': 0.922}
2024-05-01 11:59:33,407 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 11:59:33,407 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_4/evaluation_runs.json
2024-05-01 11:59:33,407 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 11:59:33,416 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 11:59:39,432 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:4	Num Templates:5	Num Examples with Template:1839
2024-05-01 12:00:04,726 - root - [INFO] - 	!!!Scores: {'accuracy': 0.922, 'average': 0.922}
2024-05-01 12:00:04,726 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:00:04,726 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_0/evaluation_runs.json
2024-05-01 12:00:04,726 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:00:05,421 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-9d580cad42f0d988/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 12:00:05,474 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:00:07,273 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:00:25,769 - root - [INFO] - 	!!!Scores: {'accuracy': 0.626, 'average': 0.626}
2024-05-01 12:00:25,769 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:00:25,769 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_1/evaluation_runs.json
2024-05-01 12:00:25,770 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:00:25,778 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:00:27,626 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:00:44,177 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 12:00:44,177 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:00:44,177 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_2/evaluation_runs.json
2024-05-01 12:00:44,178 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:00:44,185 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:00:46,000 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:01:02,263 - root - [INFO] - 	!!!Scores: {'accuracy': 0.688, 'average': 0.688}
2024-05-01 12:01:02,264 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:01:02,264 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_3/evaluation_runs.json
2024-05-01 12:01:02,264 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:01:02,272 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:01:04,072 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:01:20,508 - root - [INFO] - 	!!!Scores: {'accuracy': 0.669, 'average': 0.669}
2024-05-01 12:01:20,508 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:01:20,508 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_4/evaluation_runs.json
2024-05-01 12:01:20,509 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:01:20,516 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:01:22,314 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:01:39,108 - root - [INFO] - 	!!!Scores: {'accuracy': 0.691, 'average': 0.691}
2024-05-01 12:01:39,108 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:01:39,108 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_5/evaluation_runs.json
2024-05-01 12:01:39,108 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:01:39,117 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:01:40,936 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:01:57,411 - root - [INFO] - 	!!!Scores: {'accuracy': 0.677, 'average': 0.677}
2024-05-01 12:01:57,411 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:01:57,411 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_6/evaluation_runs.json
2024-05-01 12:01:57,411 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:01:57,419 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:01:59,791 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:02:17,931 - root - [INFO] - 	!!!Scores: {'accuracy': 0.658, 'average': 0.658}
2024-05-01 12:02:17,932 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:02:17,932 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_7/evaluation_runs.json
2024-05-01 12:02:17,932 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:02:17,940 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:02:19,778 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:02:36,480 - root - [INFO] - 	!!!Scores: {'accuracy': 0.669, 'average': 0.669}
2024-05-01 12:02:36,480 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:02:36,480 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_8/evaluation_runs.json
2024-05-01 12:02:36,481 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:02:36,488 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:02:38,333 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:02:54,865 - root - [INFO] - 	!!!Scores: {'accuracy': 0.672, 'average': 0.672}
2024-05-01 12:02:54,865 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:02:54,866 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_9/evaluation_runs.json
2024-05-01 12:02:54,866 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:02:54,874 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:02:57,257 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:03:14,427 - root - [INFO] - 	!!!Scores: {'accuracy': 0.623, 'average': 0.623}
2024-05-01 12:03:14,427 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:03:14,427 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_10/evaluation_runs.json
2024-05-01 12:03:14,427 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:03:14,435 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:03:16,805 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:03:33,665 - root - [INFO] - 	!!!Scores: {'accuracy': 0.659, 'average': 0.659}
2024-05-01 12:03:33,665 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:03:33,665 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_11/evaluation_runs.json
2024-05-01 12:03:33,665 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:03:33,673 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:03:35,468 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:03:52,008 - root - [INFO] - 	!!!Scores: {'accuracy': 0.683, 'average': 0.683}
2024-05-01 12:03:52,009 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:03:52,009 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_12/evaluation_runs.json
2024-05-01 12:03:52,009 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:03:52,018 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:03:54,368 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:04:11,249 - root - [INFO] - 	!!!Scores: {'accuracy': 0.633, 'average': 0.633}
2024-05-01 12:04:11,249 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:04:11,249 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_13/evaluation_runs.json
2024-05-01 12:04:11,249 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:04:11,257 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:04:13,600 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:04:31,347 - root - [INFO] - 	!!!Scores: {'accuracy': 0.667, 'average': 0.667}
2024-05-01 12:04:31,347 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:04:31,347 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_14/evaluation_runs.json
2024-05-01 12:04:31,347 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:04:31,355 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:04:33,200 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:04:49,734 - root - [INFO] - 	!!!Scores: {'accuracy': 0.676, 'average': 0.676}
2024-05-01 12:04:49,734 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:04:49,734 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_0/evaluation_runs.json
2024-05-01 12:04:49,734 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:04:50,659 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-2f5e309763d7971f/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 12:04:50,711 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:04:52,517 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:05:10,443 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 12:05:10,444 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:05:10,444 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_1/evaluation_runs.json
2024-05-01 12:05:10,444 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:05:10,453 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:05:12,288 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:05:28,538 - root - [INFO] - 	!!!Scores: {'accuracy': 0.525, 'average': 0.525}
2024-05-01 12:05:28,539 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:05:28,539 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_2/evaluation_runs.json
2024-05-01 12:05:28,539 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:05:28,547 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:05:30,359 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:05:46,217 - root - [INFO] - 	!!!Scores: {'accuracy': 0.533, 'average': 0.533}
2024-05-01 12:05:46,217 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:05:46,217 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_3/evaluation_runs.json
2024-05-01 12:05:46,217 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:05:46,225 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:05:48,022 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:06:04,072 - root - [INFO] - 	!!!Scores: {'accuracy': 0.492, 'average': 0.492}
2024-05-01 12:06:04,073 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:06:04,073 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_4/evaluation_runs.json
2024-05-01 12:06:04,073 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:06:04,081 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:06:05,872 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:06:22,339 - root - [INFO] - 	!!!Scores: {'accuracy': 0.514, 'average': 0.514}
2024-05-01 12:06:22,339 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:06:22,340 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_5/evaluation_runs.json
2024-05-01 12:06:22,340 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:06:22,348 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:06:24,171 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:06:40,294 - root - [INFO] - 	!!!Scores: {'accuracy': 0.518, 'average': 0.518}
2024-05-01 12:06:40,294 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:06:40,294 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_6/evaluation_runs.json
2024-05-01 12:06:40,294 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:06:40,302 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:06:42,675 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:07:00,338 - root - [INFO] - 	!!!Scores: {'accuracy': 0.511, 'average': 0.511}
2024-05-01 12:07:00,338 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:07:00,338 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_7/evaluation_runs.json
2024-05-01 12:07:00,338 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:07:00,346 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:07:02,183 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:07:18,593 - root - [INFO] - 	!!!Scores: {'accuracy': 0.514, 'average': 0.514}
2024-05-01 12:07:18,593 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:07:18,594 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_8/evaluation_runs.json
2024-05-01 12:07:18,594 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:07:18,602 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:07:20,439 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:07:36,688 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 12:07:36,688 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:07:36,688 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_9/evaluation_runs.json
2024-05-01 12:07:36,688 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:07:36,697 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:07:39,080 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:07:55,943 - root - [INFO] - 	!!!Scores: {'accuracy': 0.496, 'average': 0.496}
2024-05-01 12:07:55,944 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:07:55,944 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_10/evaluation_runs.json
2024-05-01 12:07:55,944 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:07:55,951 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:07:58,508 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:08:15,087 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 12:08:15,087 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:08:15,087 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_11/evaluation_runs.json
2024-05-01 12:08:15,087 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:08:15,095 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:08:16,899 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:08:33,148 - root - [INFO] - 	!!!Scores: {'accuracy': 0.515, 'average': 0.515}
2024-05-01 12:08:33,148 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:08:33,148 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_12/evaluation_runs.json
2024-05-01 12:08:33,149 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:08:33,156 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:08:35,533 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:08:52,079 - root - [INFO] - 	!!!Scores: {'accuracy': 0.494, 'average': 0.494}
2024-05-01 12:08:52,079 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:08:52,079 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_13/evaluation_runs.json
2024-05-01 12:08:52,079 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:08:52,088 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:08:54,432 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:09:11,805 - root - [INFO] - 	!!!Scores: {'accuracy': 0.504, 'average': 0.504}
2024-05-01 12:09:11,805 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:09:11,805 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_14/evaluation_runs.json
2024-05-01 12:09:11,805 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:09:11,813 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:09:13,645 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:09:29,892 - root - [INFO] - 	!!!Scores: {'accuracy': 0.517, 'average': 0.517}
2024-05-01 12:09:29,892 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:09:29,892 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_0/evaluation_runs.json
2024-05-01 12:09:29,892 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:09:30,816 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-5854c73e8e2a58bf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 12:09:30,877 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:09:33,037 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:0	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:09:58,327 - root - [INFO] - 	!!!Scores: {'accuracy': 0.501, 'average': 0.501}
2024-05-01 12:09:58,327 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:09:58,327 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_1/evaluation_runs.json
2024-05-01 12:09:58,328 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:09:58,336 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:10:00,538 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:1	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:10:23,403 - root - [INFO] - 	!!!Scores: {'accuracy': 0.511, 'average': 0.511}
2024-05-01 12:10:23,403 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:10:23,403 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_2/evaluation_runs.json
2024-05-01 12:10:23,404 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:10:23,411 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:10:25,594 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:2	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:10:48,075 - root - [INFO] - 	!!!Scores: {'accuracy': 0.487, 'average': 0.487}
2024-05-01 12:10:48,075 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:10:48,075 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_3/evaluation_runs.json
2024-05-01 12:10:48,075 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:10:48,083 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:10:50,235 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:3	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:11:12,912 - root - [INFO] - 	!!!Scores: {'accuracy': 0.51, 'average': 0.51}
2024-05-01 12:11:12,913 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:11:12,913 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_4/evaluation_runs.json
2024-05-01 12:11:12,913 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:11:12,922 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:11:15,073 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:4	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:11:38,323 - root - [INFO] - 	!!!Scores: {'accuracy': 0.496, 'average': 0.496}
2024-05-01 12:11:38,323 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:11:38,323 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_5/evaluation_runs.json
2024-05-01 12:11:38,323 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:11:38,331 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:11:40,508 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:5	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:12:03,279 - root - [INFO] - 	!!!Scores: {'accuracy': 0.506, 'average': 0.506}
2024-05-01 12:12:03,279 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:12:03,279 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_6/evaluation_runs.json
2024-05-01 12:12:03,279 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:12:03,287 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:12:06,138 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:6	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:12:31,017 - root - [INFO] - 	!!!Scores: {'accuracy': 0.496, 'average': 0.496}
2024-05-01 12:12:31,017 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:12:31,017 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_7/evaluation_runs.json
2024-05-01 12:12:31,017 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:12:31,026 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:12:33,243 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:7	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:12:56,399 - root - [INFO] - 	!!!Scores: {'accuracy': 0.489, 'average': 0.489}
2024-05-01 12:12:56,399 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:12:56,400 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_8/evaluation_runs.json
2024-05-01 12:12:56,400 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:12:56,408 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:12:58,617 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:8	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:13:21,475 - root - [INFO] - 	!!!Scores: {'accuracy': 0.487, 'average': 0.487}
2024-05-01 12:13:21,475 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:13:21,475 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_9/evaluation_runs.json
2024-05-01 12:13:21,475 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:13:21,484 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:13:24,337 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:9	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:13:48,071 - root - [INFO] - 	!!!Scores: {'accuracy': 0.487, 'average': 0.487}
2024-05-01 12:13:48,071 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:13:48,071 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_10/evaluation_runs.json
2024-05-01 12:13:48,072 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:13:48,080 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:13:50,942 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:10	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:14:14,312 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 12:14:14,312 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:14:14,312 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_11/evaluation_runs.json
2024-05-01 12:14:14,312 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:14:14,320 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:14:16,488 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:11	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:14:39,350 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 12:14:39,351 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:14:39,351 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_12/evaluation_runs.json
2024-05-01 12:14:39,351 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:14:39,360 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:14:42,189 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:12	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:15:05,577 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 12:15:05,577 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:15:05,577 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_13/evaluation_runs.json
2024-05-01 12:15:05,578 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:15:05,586 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:15:08,418 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:13	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:15:32,847 - root - [INFO] - 	!!!Scores: {'accuracy': 0.505, 'average': 0.505}
2024-05-01 12:15:32,847 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:15:32,847 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_14/evaluation_runs.json
2024-05-01 12:15:32,847 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:15:32,856 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:15:35,077 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:14	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:15:58,104 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 12:15:58,180 - root - [INFO] - Unexpected keys: []
2024-05-01 12:15:58,413 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 12:15:58,413 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_0/evaluation_runs.json
2024-05-01 12:15:58,413 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:15:58,421 - root - [INFO] - 		Loading Full Data for rte
2024-05-01 12:15:59,346 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-667c257dd17a5fbb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 12:15:59,368 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 12:15:59,884 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:0	Num Templates:10	Num Examples with Template:245
2024-05-01 12:16:05,674 - root - [INFO] - 	!!!Scores: {'accuracy': 0.816, 'average': 0.816}
2024-05-01 12:16:05,674 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 12:16:05,674 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_1/evaluation_runs.json
2024-05-01 12:16:05,674 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:16:05,682 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 12:16:06,198 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:1	Num Templates:10	Num Examples with Template:245
2024-05-01 12:16:11,671 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 12:16:11,671 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 12:16:11,671 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_2/evaluation_runs.json
2024-05-01 12:16:11,672 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:16:11,679 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 12:16:12,196 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:2	Num Templates:10	Num Examples with Template:245
2024-05-01 12:16:17,648 - root - [INFO] - 	!!!Scores: {'accuracy': 0.824, 'average': 0.824}
2024-05-01 12:16:17,648 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 12:16:17,648 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_3/evaluation_runs.json
2024-05-01 12:16:17,648 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:16:17,656 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 12:16:18,167 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:3	Num Templates:10	Num Examples with Template:245
2024-05-01 12:16:23,549 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 12:16:23,549 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 12:16:23,549 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_4/evaluation_runs.json
2024-05-01 12:16:23,549 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:16:23,557 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 12:16:24,068 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:4	Num Templates:10	Num Examples with Template:245
2024-05-01 12:16:29,520 - root - [INFO] - 	!!!Scores: {'accuracy': 0.812, 'average': 0.812}
2024-05-01 12:16:29,520 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 12:16:29,520 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_5/evaluation_runs.json
2024-05-01 12:16:29,520 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:16:29,528 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 12:16:30,043 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:5	Num Templates:10	Num Examples with Template:245
2024-05-01 12:16:35,494 - root - [INFO] - 	!!!Scores: {'accuracy': 0.804, 'average': 0.804}
2024-05-01 12:16:35,494 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 12:16:35,494 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_6/evaluation_runs.json
2024-05-01 12:16:35,494 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:16:35,502 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 12:16:36,017 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:6	Num Templates:10	Num Examples with Template:245
2024-05-01 12:16:41,406 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 12:16:41,406 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 12:16:41,406 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_7/evaluation_runs.json
2024-05-01 12:16:41,406 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:16:41,414 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 12:16:41,925 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:7	Num Templates:10	Num Examples with Template:245
2024-05-01 12:16:47,510 - root - [INFO] - 	!!!Scores: {'accuracy': 0.784, 'average': 0.784}
2024-05-01 12:16:47,510 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 12:16:47,510 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_8/evaluation_runs.json
2024-05-01 12:16:47,510 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:16:47,518 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 12:16:48,028 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:8	Num Templates:10	Num Examples with Template:245
2024-05-01 12:16:53,480 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 12:16:53,481 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 12:16:53,481 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_9/evaluation_runs.json
2024-05-01 12:16:53,481 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:16:53,488 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 12:16:54,005 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:9	Num Templates:10	Num Examples with Template:245
2024-05-01 12:16:59,548 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 12:16:59,548 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:16:59,549 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_0/evaluation_runs.json
2024-05-01 12:16:59,549 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:16:59,556 - root - [INFO] - 		Loading Full Data for cb
2024-05-01 12:17:00,244 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-c26e73467e3f215e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 12:17:00,253 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:17:00,319 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:0	Num Templates:15	Num Examples with Template:24
2024-05-01 12:17:01,763 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 12:17:01,763 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:17:01,764 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_1/evaluation_runs.json
2024-05-01 12:17:01,764 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:17:01,771 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:17:01,822 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:1	Num Templates:15	Num Examples with Template:24
2024-05-01 12:17:03,270 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 12:17:03,270 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:17:03,270 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_2/evaluation_runs.json
2024-05-01 12:17:03,270 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:17:03,278 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:17:03,342 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:2	Num Templates:15	Num Examples with Template:24
2024-05-01 12:17:04,812 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 12:17:04,812 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:17:04,812 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_3/evaluation_runs.json
2024-05-01 12:17:04,812 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:17:04,820 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:17:04,871 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:3	Num Templates:15	Num Examples with Template:24
2024-05-01 12:17:06,308 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 12:17:06,308 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:17:06,308 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_4/evaluation_runs.json
2024-05-01 12:17:06,308 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:17:06,316 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:17:06,367 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:4	Num Templates:15	Num Examples with Template:24
2024-05-01 12:17:07,807 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 12:17:07,808 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:17:07,808 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_5/evaluation_runs.json
2024-05-01 12:17:07,808 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:17:07,816 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:17:07,880 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:5	Num Templates:15	Num Examples with Template:24
2024-05-01 12:17:09,329 - root - [INFO] - 	!!!Scores: {'accuracy': 0.917, 'average': 0.917}
2024-05-01 12:17:09,329 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:17:09,329 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_6/evaluation_runs.json
2024-05-01 12:17:09,329 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:17:09,337 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:17:09,387 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:6	Num Templates:15	Num Examples with Template:24
2024-05-01 12:17:10,826 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 12:17:10,826 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:17:10,826 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_7/evaluation_runs.json
2024-05-01 12:17:10,827 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:17:10,834 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:17:10,898 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:7	Num Templates:15	Num Examples with Template:24
2024-05-01 12:17:12,349 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 12:17:12,349 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:17:12,349 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_8/evaluation_runs.json
2024-05-01 12:17:12,349 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:17:12,356 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:17:12,408 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:8	Num Templates:15	Num Examples with Template:24
2024-05-01 12:17:13,851 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 12:17:13,851 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:17:13,851 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_9/evaluation_runs.json
2024-05-01 12:17:13,851 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:17:13,859 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:17:13,910 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:9	Num Templates:15	Num Examples with Template:24
2024-05-01 12:17:15,358 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 12:17:15,358 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:17:15,358 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_10/evaluation_runs.json
2024-05-01 12:17:15,358 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:17:15,366 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:17:15,431 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:10	Num Templates:15	Num Examples with Template:24
2024-05-01 12:17:16,887 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 12:17:16,887 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:17:16,887 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_11/evaluation_runs.json
2024-05-01 12:17:16,887 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:17:16,895 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:17:16,946 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:11	Num Templates:15	Num Examples with Template:24
2024-05-01 12:17:18,388 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 12:17:18,388 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:17:18,388 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_12/evaluation_runs.json
2024-05-01 12:17:18,388 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:17:18,395 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:17:18,446 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:12	Num Templates:15	Num Examples with Template:24
2024-05-01 12:17:19,934 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 12:17:19,934 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:17:19,934 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_13/evaluation_runs.json
2024-05-01 12:17:19,934 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:17:19,942 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:17:19,993 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:13	Num Templates:15	Num Examples with Template:24
2024-05-01 12:17:21,435 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 12:17:21,436 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:17:21,436 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_14/evaluation_runs.json
2024-05-01 12:17:21,436 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:17:21,443 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:17:21,508 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:14	Num Templates:15	Num Examples with Template:24
2024-05-01 12:17:22,989 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 12:17:22,989 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 12:17:22,989 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_0/evaluation_runs.json
2024-05-01 12:17:22,989 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:17:23,691 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-3ae7a9f52719d86a/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 12:17:23,745 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 12:17:27,342 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:0	Num Templates:5	Num Examples with Template:1235
2024-05-01 12:17:36,981 - root - [INFO] - 	!!!Scores: {'accuracy': 0.683, 'average': 0.683}
2024-05-01 12:17:36,981 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 12:17:36,981 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_1/evaluation_runs.json
2024-05-01 12:17:36,981 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:17:36,990 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 12:17:40,521 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:1	Num Templates:5	Num Examples with Template:1235
2024-05-01 12:17:50,296 - root - [INFO] - 	!!!Scores: {'accuracy': 0.67, 'average': 0.67}
2024-05-01 12:17:50,296 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 12:17:50,296 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_2/evaluation_runs.json
2024-05-01 12:17:50,296 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:17:50,304 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 12:17:53,890 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:2	Num Templates:5	Num Examples with Template:1235
2024-05-01 12:18:03,766 - root - [INFO] - 	!!!Scores: {'accuracy': 0.684, 'average': 0.684}
2024-05-01 12:18:03,767 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 12:18:03,767 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_3/evaluation_runs.json
2024-05-01 12:18:03,767 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:18:03,775 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 12:18:07,407 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:3	Num Templates:5	Num Examples with Template:1235
2024-05-01 12:18:17,605 - root - [INFO] - 	!!!Scores: {'accuracy': 0.672, 'average': 0.672}
2024-05-01 12:18:17,605 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 12:18:17,605 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_4/evaluation_runs.json
2024-05-01 12:18:17,605 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:18:17,613 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 12:18:21,202 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:4	Num Templates:5	Num Examples with Template:1235
2024-05-01 12:18:31,259 - root - [INFO] - 	!!!Scores: {'accuracy': 0.672, 'average': 0.672}
2024-05-01 12:18:31,259 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 12:18:31,259 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_0/evaluation_runs.json
2024-05-01 12:18:31,259 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:18:31,267 - root - [INFO] - 		Loading Full Data for wic
2024-05-01 12:18:31,945 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-8f888d9273347dcb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 12:18:31,997 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 12:18:33,487 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:0	Num Templates:10	Num Examples with Template:606
2024-05-01 12:18:38,864 - root - [INFO] - 	!!!Scores: {'accuracy': 0.649, 'average': 0.649}
2024-05-01 12:18:38,864 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 12:18:38,864 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_1/evaluation_runs.json
2024-05-01 12:18:38,864 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:18:38,872 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 12:18:40,362 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:1	Num Templates:10	Num Examples with Template:606
2024-05-01 12:18:45,598 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 12:18:45,598 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 12:18:45,598 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_2/evaluation_runs.json
2024-05-01 12:18:45,598 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:18:45,607 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 12:18:47,105 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:2	Num Templates:10	Num Examples with Template:606
2024-05-01 12:18:52,854 - root - [INFO] - 	!!!Scores: {'accuracy': 0.645, 'average': 0.645}
2024-05-01 12:18:52,854 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 12:18:52,855 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_3/evaluation_runs.json
2024-05-01 12:18:52,855 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:18:52,863 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 12:18:54,354 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:3	Num Templates:10	Num Examples with Template:606
2024-05-01 12:19:00,232 - root - [INFO] - 	!!!Scores: {'accuracy': 0.533, 'average': 0.533}
2024-05-01 12:19:00,232 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 12:19:00,232 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_4/evaluation_runs.json
2024-05-01 12:19:00,232 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:19:00,242 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 12:19:01,729 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:4	Num Templates:10	Num Examples with Template:606
2024-05-01 12:19:07,203 - root - [INFO] - 	!!!Scores: {'accuracy': 0.538, 'average': 0.538}
2024-05-01 12:19:07,204 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 12:19:07,204 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_5/evaluation_runs.json
2024-05-01 12:19:07,204 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:19:07,215 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 12:19:08,719 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:5	Num Templates:10	Num Examples with Template:606
2024-05-01 12:19:14,613 - root - [INFO] - 	!!!Scores: {'accuracy': 0.645, 'average': 0.645}
2024-05-01 12:19:14,613 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 12:19:14,613 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_6/evaluation_runs.json
2024-05-01 12:19:14,613 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:19:14,626 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 12:19:16,125 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:6	Num Templates:10	Num Examples with Template:606
2024-05-01 12:19:21,578 - root - [INFO] - 	!!!Scores: {'accuracy': 0.658, 'average': 0.658}
2024-05-01 12:19:21,578 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 12:19:21,578 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_7/evaluation_runs.json
2024-05-01 12:19:21,578 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:19:21,586 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 12:19:23,067 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:7	Num Templates:10	Num Examples with Template:606
2024-05-01 12:19:28,685 - root - [INFO] - 	!!!Scores: {'accuracy': 0.518, 'average': 0.518}
2024-05-01 12:19:28,686 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 12:19:28,686 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_8/evaluation_runs.json
2024-05-01 12:19:28,686 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:19:28,695 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 12:19:30,193 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:8	Num Templates:10	Num Examples with Template:606
2024-05-01 12:19:36,312 - root - [INFO] - 	!!!Scores: {'accuracy': 0.632, 'average': 0.632}
2024-05-01 12:19:36,312 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 12:19:36,312 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_9/evaluation_runs.json
2024-05-01 12:19:36,312 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:19:36,322 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 12:19:37,802 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:9	Num Templates:10	Num Examples with Template:606
2024-05-01 12:19:42,588 - root - [INFO] - 	!!!Scores: {'accuracy': 0.645, 'average': 0.645}
2024-05-01 12:19:42,588 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 12:19:42,588 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_0/evaluation_runs.json
2024-05-01 12:19:42,589 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:19:42,596 - root - [INFO] - 		Loading Full Data for wsc
2024-05-01 12:19:43,305 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-4bcfdb1f33f6e278/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 12:19:43,321 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 12:19:43,510 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:0	Num Templates:10	Num Examples with Template:72
2024-05-01 12:19:45,321 - root - [INFO] - 	!!!Scores: {'accuracy': 0.597, 'average': 0.597}
2024-05-01 12:19:45,321 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 12:19:45,321 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_1/evaluation_runs.json
2024-05-01 12:19:45,322 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:19:45,331 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 12:19:45,505 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:1	Num Templates:10	Num Examples with Template:72
2024-05-01 12:19:47,290 - root - [INFO] - 	!!!Scores: {'accuracy': 0.569, 'average': 0.569}
2024-05-01 12:19:47,290 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 12:19:47,290 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_2/evaluation_runs.json
2024-05-01 12:19:47,290 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:19:47,304 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 12:19:47,527 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:2	Num Templates:10	Num Examples with Template:72
2024-05-01 12:19:49,472 - root - [INFO] - 	!!!Scores: {'accuracy': 0.708, 'average': 0.708}
2024-05-01 12:19:49,472 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 12:19:49,472 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_3/evaluation_runs.json
2024-05-01 12:19:49,472 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:19:49,481 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 12:19:49,699 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:3	Num Templates:10	Num Examples with Template:72
2024-05-01 12:19:51,654 - root - [INFO] - 	!!!Scores: {'accuracy': 0.694, 'average': 0.694}
2024-05-01 12:19:51,654 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 12:19:51,654 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_4/evaluation_runs.json
2024-05-01 12:19:51,654 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:19:51,664 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 12:19:51,849 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:4	Num Templates:10	Num Examples with Template:72
2024-05-01 12:19:53,619 - root - [INFO] - 	!!!Scores: {'accuracy': 0.569, 'average': 0.569}
2024-05-01 12:19:53,619 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 12:19:53,619 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_5/evaluation_runs.json
2024-05-01 12:19:53,619 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:19:53,627 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 12:19:53,805 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:5	Num Templates:10	Num Examples with Template:72
2024-05-01 12:19:55,612 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 12:19:55,612 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 12:19:55,612 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_6/evaluation_runs.json
2024-05-01 12:19:55,612 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:19:55,621 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 12:19:55,796 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:6	Num Templates:10	Num Examples with Template:72
2024-05-01 12:19:57,607 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 12:19:57,607 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 12:19:57,607 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_7/evaluation_runs.json
2024-05-01 12:19:57,607 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:19:57,618 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 12:19:57,895 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:7	Num Templates:10	Num Examples with Template:72
2024-05-01 12:19:59,676 - root - [INFO] - 	!!!Scores: {'accuracy': 0.528, 'average': 0.528}
2024-05-01 12:19:59,676 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 12:19:59,676 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_8/evaluation_runs.json
2024-05-01 12:19:59,677 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:19:59,685 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 12:19:59,860 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:8	Num Templates:10	Num Examples with Template:72
2024-05-01 12:20:01,659 - root - [INFO] - 	!!!Scores: {'accuracy': 0.597, 'average': 0.597}
2024-05-01 12:20:01,659 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 12:20:01,659 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_9/evaluation_runs.json
2024-05-01 12:20:01,660 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:20:01,667 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 12:20:01,962 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:9	Num Templates:10	Num Examples with Template:72
2024-05-01 12:20:03,729 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 12:20:03,729 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 12:20:03,729 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_0/evaluation_runs.json
2024-05-01 12:20:03,729 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:20:03,738 - root - [INFO] - 		Loading Full Data for copa
2024-05-01 12:20:04,441 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-75ee279101665e7b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 12:20:04,455 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 12:20:04,673 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:0	Num Templates:8	Num Examples with Template:68
2024-05-01 12:20:06,262 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 12:20:06,262 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 12:20:06,262 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_1/evaluation_runs.json
2024-05-01 12:20:06,262 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:20:06,270 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 12:20:06,480 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:1	Num Templates:8	Num Examples with Template:68
2024-05-01 12:20:08,108 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 12:20:08,108 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 12:20:08,108 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_2/evaluation_runs.json
2024-05-01 12:20:08,109 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:20:08,117 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 12:20:08,327 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:2	Num Templates:8	Num Examples with Template:68
2024-05-01 12:20:09,925 - root - [INFO] - 	!!!Scores: {'accuracy': 0.926, 'average': 0.926}
2024-05-01 12:20:09,925 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 12:20:09,925 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_3/evaluation_runs.json
2024-05-01 12:20:09,925 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:20:09,933 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 12:20:10,157 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:3	Num Templates:8	Num Examples with Template:68
2024-05-01 12:20:11,696 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 12:20:11,696 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 12:20:11,696 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_4/evaluation_runs.json
2024-05-01 12:20:11,696 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:20:11,704 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 12:20:11,913 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:4	Num Templates:8	Num Examples with Template:68
2024-05-01 12:20:13,520 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 12:20:13,520 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 12:20:13,520 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_5/evaluation_runs.json
2024-05-01 12:20:13,520 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:20:13,528 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 12:20:13,740 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:5	Num Templates:8	Num Examples with Template:68
2024-05-01 12:20:15,356 - root - [INFO] - 	!!!Scores: {'accuracy': 0.868, 'average': 0.868}
2024-05-01 12:20:15,356 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 12:20:15,356 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_6/evaluation_runs.json
2024-05-01 12:20:15,356 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:20:15,364 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 12:20:15,574 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:6	Num Templates:8	Num Examples with Template:68
2024-05-01 12:20:17,144 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 12:20:17,144 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 12:20:17,144 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_7/evaluation_runs.json
2024-05-01 12:20:17,144 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:20:17,152 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 12:20:17,360 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:7	Num Templates:8	Num Examples with Template:68
2024-05-01 12:20:18,923 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 12:20:18,923 - root - [INFO] - 	Evaluating model on h-swag dataset
2024-05-01 12:20:18,923 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/h-swag_template_0/evaluation_runs.json
2024-05-01 12:20:18,923 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:20:18,936 - root - [INFO] - 		Loading Full Data for h-swag
2024-05-01 12:20:19,653 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-eac19eb96b26d7b6/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 12:20:20,474 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Selected Examples: 10010	Num Total Example:10010
2024-05-01 12:20:42,635 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Num Selected Example with Templates:10010	Template Idx:0	Num Templates:1	Num Examples with Template:10010
2024-05-01 12:25:56,581 - root - [INFO] - 	!!!Scores: {'accuracy': 0.427, 'average': 0.427}
2024-05-01 12:25:56,582 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 12:25:56,582 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_0/evaluation_runs.json
2024-05-01 12:25:56,582 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:25:57,282 - datasets.builder - [WARNING] - Found cached dataset arrow (/home/guodong/.cache/huggingface/datasets/arrow/default-2796e0ea53bfdf30/0.0.0/74f69db2c14c2860059d39860b1f400a03d11bf7fb5a8258ca38c501c878c137)
2024-05-01 12:25:57,394 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 12:26:03,375 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:0	Num Templates:5	Num Examples with Template:1839
2024-05-01 12:26:27,905 - root - [INFO] - 	!!!Scores: {'accuracy': 0.928, 'average': 0.928}
2024-05-01 12:26:27,906 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 12:26:27,906 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_1/evaluation_runs.json
2024-05-01 12:26:27,906 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:26:27,916 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 12:26:34,147 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:1	Num Templates:5	Num Examples with Template:1839
2024-05-01 12:26:59,158 - root - [INFO] - 	!!!Scores: {'accuracy': 0.93, 'average': 0.93}
2024-05-01 12:26:59,159 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 12:26:59,159 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_2/evaluation_runs.json
2024-05-01 12:26:59,159 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:26:59,170 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 12:27:05,192 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:2	Num Templates:5	Num Examples with Template:1839
2024-05-01 12:27:30,017 - root - [INFO] - 	!!!Scores: {'accuracy': 0.93, 'average': 0.93}
2024-05-01 12:27:30,017 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 12:27:30,018 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_3/evaluation_runs.json
2024-05-01 12:27:30,018 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:27:30,031 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 12:27:36,102 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:3	Num Templates:5	Num Examples with Template:1839
2024-05-01 12:28:01,053 - root - [INFO] - 	!!!Scores: {'accuracy': 0.926, 'average': 0.926}
2024-05-01 12:28:01,054 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 12:28:01,054 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_4/evaluation_runs.json
2024-05-01 12:28:01,054 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:28:01,064 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 12:28:07,109 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:4	Num Templates:5	Num Examples with Template:1839
2024-05-01 12:28:32,608 - root - [INFO] - 	!!!Scores: {'accuracy': 0.926, 'average': 0.926}
2024-05-01 12:28:32,608 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:28:32,608 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_0/evaluation_runs.json
2024-05-01 12:28:32,608 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:28:33,523 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-9d580cad42f0d988/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 12:28:33,573 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:28:35,400 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:28:53,930 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 12:28:53,930 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:28:53,930 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_1/evaluation_runs.json
2024-05-01 12:28:53,930 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:28:53,939 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:28:55,788 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:29:12,355 - root - [INFO] - 	!!!Scores: {'accuracy': 0.676, 'average': 0.676}
2024-05-01 12:29:12,355 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:29:12,355 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_2/evaluation_runs.json
2024-05-01 12:29:12,355 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:29:12,365 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:29:14,190 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:29:30,479 - root - [INFO] - 	!!!Scores: {'accuracy': 0.685, 'average': 0.685}
2024-05-01 12:29:30,479 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:29:30,479 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_3/evaluation_runs.json
2024-05-01 12:29:30,479 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:29:30,488 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:29:32,312 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:29:48,767 - root - [INFO] - 	!!!Scores: {'accuracy': 0.666, 'average': 0.666}
2024-05-01 12:29:48,767 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:29:48,767 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_4/evaluation_runs.json
2024-05-01 12:29:48,767 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:29:48,776 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:29:50,590 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:30:07,405 - root - [INFO] - 	!!!Scores: {'accuracy': 0.688, 'average': 0.688}
2024-05-01 12:30:07,405 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:30:07,406 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_5/evaluation_runs.json
2024-05-01 12:30:07,406 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:30:07,415 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:30:09,240 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:30:25,719 - root - [INFO] - 	!!!Scores: {'accuracy': 0.673, 'average': 0.673}
2024-05-01 12:30:25,719 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:30:25,719 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_6/evaluation_runs.json
2024-05-01 12:30:25,719 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:30:25,727 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:30:28,109 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:30:46,261 - root - [INFO] - 	!!!Scores: {'accuracy': 0.657, 'average': 0.657}
2024-05-01 12:30:46,262 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:30:46,262 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_7/evaluation_runs.json
2024-05-01 12:30:46,262 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:30:46,271 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:30:48,117 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:31:04,839 - root - [INFO] - 	!!!Scores: {'accuracy': 0.67, 'average': 0.67}
2024-05-01 12:31:04,840 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:31:04,840 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_8/evaluation_runs.json
2024-05-01 12:31:04,840 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:31:04,849 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:31:06,867 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:31:23,394 - root - [INFO] - 	!!!Scores: {'accuracy': 0.671, 'average': 0.671}
2024-05-01 12:31:23,394 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:31:23,394 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_9/evaluation_runs.json
2024-05-01 12:31:23,395 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:31:23,402 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:31:25,805 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:31:42,985 - root - [INFO] - 	!!!Scores: {'accuracy': 0.62, 'average': 0.62}
2024-05-01 12:31:42,986 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:31:42,986 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_10/evaluation_runs.json
2024-05-01 12:31:42,986 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:31:42,995 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:31:45,380 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:32:02,252 - root - [INFO] - 	!!!Scores: {'accuracy': 0.655, 'average': 0.655}
2024-05-01 12:32:02,252 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:32:02,252 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_11/evaluation_runs.json
2024-05-01 12:32:02,253 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:32:02,260 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:32:04,078 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:32:20,617 - root - [INFO] - 	!!!Scores: {'accuracy': 0.68, 'average': 0.68}
2024-05-01 12:32:20,617 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:32:20,617 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_12/evaluation_runs.json
2024-05-01 12:32:20,617 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:32:20,626 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:32:22,985 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:32:39,867 - root - [INFO] - 	!!!Scores: {'accuracy': 0.629, 'average': 0.629}
2024-05-01 12:32:39,867 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:32:39,867 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_13/evaluation_runs.json
2024-05-01 12:32:39,867 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:32:39,875 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:32:42,237 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:32:59,984 - root - [INFO] - 	!!!Scores: {'accuracy': 0.665, 'average': 0.665}
2024-05-01 12:32:59,984 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:32:59,984 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_14/evaluation_runs.json
2024-05-01 12:32:59,984 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:32:59,993 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:33:01,852 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:33:18,402 - root - [INFO] - 	!!!Scores: {'accuracy': 0.675, 'average': 0.675}
2024-05-01 12:33:18,402 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:33:18,402 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_0/evaluation_runs.json
2024-05-01 12:33:18,402 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:33:19,818 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-2f5e309763d7971f/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 12:33:19,872 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:33:21,674 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:33:39,593 - root - [INFO] - 	!!!Scores: {'accuracy': 0.496, 'average': 0.496}
2024-05-01 12:33:39,594 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:33:39,594 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_1/evaluation_runs.json
2024-05-01 12:33:39,594 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:33:39,602 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:33:41,445 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:33:57,710 - root - [INFO] - 	!!!Scores: {'accuracy': 0.524, 'average': 0.524}
2024-05-01 12:33:57,710 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:33:57,710 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_2/evaluation_runs.json
2024-05-01 12:33:57,710 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:33:57,719 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:33:59,538 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:34:15,389 - root - [INFO] - 	!!!Scores: {'accuracy': 0.528, 'average': 0.528}
2024-05-01 12:34:15,389 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:34:15,389 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_3/evaluation_runs.json
2024-05-01 12:34:15,389 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:34:15,397 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:34:17,194 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:34:33,231 - root - [INFO] - 	!!!Scores: {'accuracy': 0.492, 'average': 0.492}
2024-05-01 12:34:33,231 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:34:33,231 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_4/evaluation_runs.json
2024-05-01 12:34:33,232 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:34:33,239 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:34:35,030 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:34:51,481 - root - [INFO] - 	!!!Scores: {'accuracy': 0.514, 'average': 0.514}
2024-05-01 12:34:51,481 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:34:51,482 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_5/evaluation_runs.json
2024-05-01 12:34:51,482 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:34:51,489 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:34:53,299 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:35:09,405 - root - [INFO] - 	!!!Scores: {'accuracy': 0.515, 'average': 0.515}
2024-05-01 12:35:09,405 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:35:09,405 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_6/evaluation_runs.json
2024-05-01 12:35:09,405 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:35:09,414 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:35:11,778 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:35:29,422 - root - [INFO] - 	!!!Scores: {'accuracy': 0.506, 'average': 0.506}
2024-05-01 12:35:29,422 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:35:29,422 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_7/evaluation_runs.json
2024-05-01 12:35:29,422 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:35:29,430 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:35:31,263 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:35:47,650 - root - [INFO] - 	!!!Scores: {'accuracy': 0.515, 'average': 0.515}
2024-05-01 12:35:47,650 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:35:47,650 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_8/evaluation_runs.json
2024-05-01 12:35:47,650 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:35:47,658 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:35:49,487 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:36:05,723 - root - [INFO] - 	!!!Scores: {'accuracy': 0.513, 'average': 0.513}
2024-05-01 12:36:05,724 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:36:05,724 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_9/evaluation_runs.json
2024-05-01 12:36:05,724 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:36:05,732 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:36:08,101 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:36:24,957 - root - [INFO] - 	!!!Scores: {'accuracy': 0.493, 'average': 0.493}
2024-05-01 12:36:24,957 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:36:24,957 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_10/evaluation_runs.json
2024-05-01 12:36:24,958 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:36:24,966 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:36:27,328 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:36:43,906 - root - [INFO] - 	!!!Scores: {'accuracy': 0.501, 'average': 0.501}
2024-05-01 12:36:43,906 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:36:43,906 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_11/evaluation_runs.json
2024-05-01 12:36:43,906 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:36:43,914 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:36:45,724 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:37:01,977 - root - [INFO] - 	!!!Scores: {'accuracy': 0.517, 'average': 0.517}
2024-05-01 12:37:01,977 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:37:01,977 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_12/evaluation_runs.json
2024-05-01 12:37:01,977 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:37:01,985 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:37:04,337 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:37:20,884 - root - [INFO] - 	!!!Scores: {'accuracy': 0.496, 'average': 0.496}
2024-05-01 12:37:20,884 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:37:20,884 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_13/evaluation_runs.json
2024-05-01 12:37:20,884 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:37:20,893 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:37:23,237 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:37:40,613 - root - [INFO] - 	!!!Scores: {'accuracy': 0.503, 'average': 0.503}
2024-05-01 12:37:40,614 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 12:37:40,614 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_14/evaluation_runs.json
2024-05-01 12:37:40,614 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:37:40,622 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:37:42,461 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:37:58,709 - root - [INFO] - 	!!!Scores: {'accuracy': 0.515, 'average': 0.515}
2024-05-01 12:37:58,709 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:37:58,709 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_0/evaluation_runs.json
2024-05-01 12:37:58,709 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:37:59,683 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-5854c73e8e2a58bf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 12:37:59,741 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:38:01,902 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:0	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:38:27,199 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 12:38:27,199 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:38:27,199 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_1/evaluation_runs.json
2024-05-01 12:38:27,199 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:38:27,207 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:38:29,409 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:1	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:38:52,275 - root - [INFO] - 	!!!Scores: {'accuracy': 0.504, 'average': 0.504}
2024-05-01 12:38:52,275 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:38:52,275 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_2/evaluation_runs.json
2024-05-01 12:38:52,275 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:38:52,284 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:38:54,459 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:2	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:39:16,936 - root - [INFO] - 	!!!Scores: {'accuracy': 0.484, 'average': 0.484}
2024-05-01 12:39:16,936 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:39:16,936 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_3/evaluation_runs.json
2024-05-01 12:39:16,936 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:39:16,944 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:39:19,101 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:3	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:39:41,786 - root - [INFO] - 	!!!Scores: {'accuracy': 0.507, 'average': 0.507}
2024-05-01 12:39:41,786 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:39:41,786 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_4/evaluation_runs.json
2024-05-01 12:39:41,787 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:39:41,795 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:39:43,949 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:4	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:40:07,197 - root - [INFO] - 	!!!Scores: {'accuracy': 0.496, 'average': 0.496}
2024-05-01 12:40:07,197 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:40:07,197 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_5/evaluation_runs.json
2024-05-01 12:40:07,197 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:40:07,207 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:40:09,389 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:5	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:40:32,137 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 12:40:32,137 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:40:32,137 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_6/evaluation_runs.json
2024-05-01 12:40:32,137 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:40:32,145 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:40:34,990 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:6	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:40:59,866 - root - [INFO] - 	!!!Scores: {'accuracy': 0.493, 'average': 0.493}
2024-05-01 12:40:59,866 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:40:59,866 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_7/evaluation_runs.json
2024-05-01 12:40:59,867 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:40:59,876 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:41:02,083 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:7	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:41:25,235 - root - [INFO] - 	!!!Scores: {'accuracy': 0.488, 'average': 0.488}
2024-05-01 12:41:25,235 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:41:25,235 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_8/evaluation_runs.json
2024-05-01 12:41:25,235 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:41:25,244 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:41:27,447 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:8	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:41:50,293 - root - [INFO] - 	!!!Scores: {'accuracy': 0.489, 'average': 0.489}
2024-05-01 12:41:50,293 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:41:50,293 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_9/evaluation_runs.json
2024-05-01 12:41:50,293 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:41:50,301 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:41:53,146 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:9	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:42:16,884 - root - [INFO] - 	!!!Scores: {'accuracy': 0.489, 'average': 0.489}
2024-05-01 12:42:16,884 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:42:16,884 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_10/evaluation_runs.json
2024-05-01 12:42:16,884 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:42:16,894 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:42:19,728 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:10	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:42:43,101 - root - [INFO] - 	!!!Scores: {'accuracy': 0.504, 'average': 0.504}
2024-05-01 12:42:43,101 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:42:43,101 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_11/evaluation_runs.json
2024-05-01 12:42:43,101 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:42:43,109 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:42:45,265 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:11	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:43:08,122 - root - [INFO] - 	!!!Scores: {'accuracy': 0.493, 'average': 0.493}
2024-05-01 12:43:08,122 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:43:08,122 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_12/evaluation_runs.json
2024-05-01 12:43:08,123 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:43:08,131 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:43:10,950 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:12	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:43:34,332 - root - [INFO] - 	!!!Scores: {'accuracy': 0.513, 'average': 0.513}
2024-05-01 12:43:34,332 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:43:34,332 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_13/evaluation_runs.json
2024-05-01 12:43:34,332 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:43:34,342 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:43:37,157 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:13	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:44:01,559 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 12:44:01,559 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 12:44:01,559 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_14/evaluation_runs.json
2024-05-01 12:44:01,559 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:44:01,567 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 12:44:03,782 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:14	Num Templates:15	Num Examples with Template:1200
2024-05-01 12:44:26,645 - root - [INFO] - 	!!!Scores: {'accuracy': 0.494, 'average': 0.494}
2024-05-01 12:44:26,720 - root - [INFO] - Unexpected keys: []
2024-05-01 12:44:26,954 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 12:44:26,954 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_0/evaluation_runs.json
2024-05-01 12:44:26,954 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:44:26,962 - root - [INFO] - 		Loading Full Data for rte
2024-05-01 12:44:27,662 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-667c257dd17a5fbb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 12:44:27,685 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 12:44:28,200 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:0	Num Templates:10	Num Examples with Template:245
2024-05-01 12:44:33,955 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 12:44:33,955 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 12:44:33,955 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_1/evaluation_runs.json
2024-05-01 12:44:33,955 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:44:33,963 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 12:44:34,481 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:1	Num Templates:10	Num Examples with Template:245
2024-05-01 12:44:39,931 - root - [INFO] - 	!!!Scores: {'accuracy': 0.78, 'average': 0.78}
2024-05-01 12:44:39,931 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 12:44:39,931 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_2/evaluation_runs.json
2024-05-01 12:44:39,931 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:44:39,940 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 12:44:40,459 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:2	Num Templates:10	Num Examples with Template:245
2024-05-01 12:44:45,909 - root - [INFO] - 	!!!Scores: {'accuracy': 0.816, 'average': 0.816}
2024-05-01 12:44:45,909 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 12:44:45,909 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_3/evaluation_runs.json
2024-05-01 12:44:45,910 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:44:45,918 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 12:44:46,431 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:3	Num Templates:10	Num Examples with Template:245
2024-05-01 12:44:51,808 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 12:44:51,808 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 12:44:51,808 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_4/evaluation_runs.json
2024-05-01 12:44:51,808 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:44:51,816 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 12:44:52,330 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:4	Num Templates:10	Num Examples with Template:245
2024-05-01 12:44:57,783 - root - [INFO] - 	!!!Scores: {'accuracy': 0.8, 'average': 0.8}
2024-05-01 12:44:57,783 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 12:44:57,783 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_5/evaluation_runs.json
2024-05-01 12:44:57,783 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:44:57,791 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 12:44:58,309 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:5	Num Templates:10	Num Examples with Template:245
2024-05-01 12:45:03,763 - root - [INFO] - 	!!!Scores: {'accuracy': 0.796, 'average': 0.796}
2024-05-01 12:45:03,763 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 12:45:03,763 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_6/evaluation_runs.json
2024-05-01 12:45:03,763 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:45:03,771 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 12:45:04,289 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:6	Num Templates:10	Num Examples with Template:245
2024-05-01 12:45:09,674 - root - [INFO] - 	!!!Scores: {'accuracy': 0.763, 'average': 0.763}
2024-05-01 12:45:09,675 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 12:45:09,675 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_7/evaluation_runs.json
2024-05-01 12:45:09,675 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:45:09,683 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 12:45:10,303 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:7	Num Templates:10	Num Examples with Template:245
2024-05-01 12:45:15,888 - root - [INFO] - 	!!!Scores: {'accuracy': 0.784, 'average': 0.784}
2024-05-01 12:45:15,888 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 12:45:15,888 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_8/evaluation_runs.json
2024-05-01 12:45:15,888 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:45:15,896 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 12:45:16,410 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:8	Num Templates:10	Num Examples with Template:245
2024-05-01 12:45:21,850 - root - [INFO] - 	!!!Scores: {'accuracy': 0.8, 'average': 0.8}
2024-05-01 12:45:21,850 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 12:45:21,851 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_9/evaluation_runs.json
2024-05-01 12:45:21,851 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:45:21,859 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 12:45:22,378 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:9	Num Templates:10	Num Examples with Template:245
2024-05-01 12:45:27,916 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 12:45:27,916 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:45:27,916 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_0/evaluation_runs.json
2024-05-01 12:45:27,916 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:45:27,924 - root - [INFO] - 		Loading Full Data for cb
2024-05-01 12:45:28,628 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-c26e73467e3f215e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 12:45:28,638 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:45:28,702 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:0	Num Templates:15	Num Examples with Template:24
2024-05-01 12:45:30,147 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 12:45:30,147 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:45:30,147 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_1/evaluation_runs.json
2024-05-01 12:45:30,147 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:45:30,155 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:45:30,206 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:1	Num Templates:15	Num Examples with Template:24
2024-05-01 12:45:31,652 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 12:45:31,653 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:45:31,653 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_2/evaluation_runs.json
2024-05-01 12:45:31,653 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:45:31,661 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:45:31,725 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:2	Num Templates:15	Num Examples with Template:24
2024-05-01 12:45:33,192 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 12:45:33,192 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:45:33,192 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_3/evaluation_runs.json
2024-05-01 12:45:33,193 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:45:33,200 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:45:33,251 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:3	Num Templates:15	Num Examples with Template:24
2024-05-01 12:45:34,685 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 12:45:34,686 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:45:34,686 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_4/evaluation_runs.json
2024-05-01 12:45:34,686 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:45:34,693 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:45:34,744 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:4	Num Templates:15	Num Examples with Template:24
2024-05-01 12:45:36,183 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 12:45:36,183 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:45:36,183 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_5/evaluation_runs.json
2024-05-01 12:45:36,183 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:45:36,191 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:45:36,255 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:5	Num Templates:15	Num Examples with Template:24
2024-05-01 12:45:37,705 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 12:45:37,705 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:45:37,705 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_6/evaluation_runs.json
2024-05-01 12:45:37,705 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:45:37,713 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:45:37,763 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:6	Num Templates:15	Num Examples with Template:24
2024-05-01 12:45:39,200 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 12:45:39,200 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:45:39,200 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_7/evaluation_runs.json
2024-05-01 12:45:39,200 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:45:39,207 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:45:39,271 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:7	Num Templates:15	Num Examples with Template:24
2024-05-01 12:45:40,720 - root - [INFO] - 	!!!Scores: {'accuracy': 0.75, 'average': 0.75}
2024-05-01 12:45:40,720 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:45:40,720 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_8/evaluation_runs.json
2024-05-01 12:45:40,720 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:45:40,728 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:45:40,779 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:8	Num Templates:15	Num Examples with Template:24
2024-05-01 12:45:42,222 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 12:45:42,222 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:45:42,222 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_9/evaluation_runs.json
2024-05-01 12:45:42,222 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:45:42,230 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:45:42,281 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:9	Num Templates:15	Num Examples with Template:24
2024-05-01 12:45:43,728 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 12:45:43,728 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:45:43,728 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_10/evaluation_runs.json
2024-05-01 12:45:43,728 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:45:43,735 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:45:43,800 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:10	Num Templates:15	Num Examples with Template:24
2024-05-01 12:45:45,257 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 12:45:45,257 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:45:45,257 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_11/evaluation_runs.json
2024-05-01 12:45:45,258 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:45:45,265 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:45:45,316 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:11	Num Templates:15	Num Examples with Template:24
2024-05-01 12:45:46,760 - root - [INFO] - 	!!!Scores: {'accuracy': 0.708, 'average': 0.708}
2024-05-01 12:45:46,761 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:45:46,761 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_12/evaluation_runs.json
2024-05-01 12:45:46,761 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:45:46,769 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:45:46,820 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:12	Num Templates:15	Num Examples with Template:24
2024-05-01 12:45:48,307 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 12:45:48,308 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:45:48,308 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_13/evaluation_runs.json
2024-05-01 12:45:48,308 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:45:48,315 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:45:48,366 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:13	Num Templates:15	Num Examples with Template:24
2024-05-01 12:45:49,808 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 12:45:49,809 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 12:45:49,809 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_14/evaluation_runs.json
2024-05-01 12:45:49,809 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:45:49,816 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 12:45:49,881 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:14	Num Templates:15	Num Examples with Template:24
2024-05-01 12:45:51,359 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 12:45:51,360 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 12:45:51,360 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_0/evaluation_runs.json
2024-05-01 12:45:51,360 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:45:52,068 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-3ae7a9f52719d86a/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 12:45:52,124 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 12:45:55,725 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:0	Num Templates:5	Num Examples with Template:1235
2024-05-01 12:46:05,363 - root - [INFO] - 	!!!Scores: {'accuracy': 0.679, 'average': 0.679}
2024-05-01 12:46:05,363 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 12:46:05,363 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_1/evaluation_runs.json
2024-05-01 12:46:05,363 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:46:05,371 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 12:46:08,903 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:1	Num Templates:5	Num Examples with Template:1235
2024-05-01 12:46:18,619 - root - [INFO] - 	!!!Scores: {'accuracy': 0.666, 'average': 0.666}
2024-05-01 12:46:18,620 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 12:46:18,620 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_2/evaluation_runs.json
2024-05-01 12:46:18,620 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:46:18,628 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 12:46:22,236 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:2	Num Templates:5	Num Examples with Template:1235
2024-05-01 12:46:32,019 - root - [INFO] - 	!!!Scores: {'accuracy': 0.675, 'average': 0.675}
2024-05-01 12:46:32,019 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 12:46:32,019 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_3/evaluation_runs.json
2024-05-01 12:46:32,020 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:46:32,028 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 12:46:35,669 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:3	Num Templates:5	Num Examples with Template:1235
2024-05-01 12:46:45,779 - root - [INFO] - 	!!!Scores: {'accuracy': 0.669, 'average': 0.669}
2024-05-01 12:46:45,779 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 12:46:45,779 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_4/evaluation_runs.json
2024-05-01 12:46:45,780 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:46:45,788 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 12:46:49,398 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:4	Num Templates:5	Num Examples with Template:1235
2024-05-01 12:46:59,380 - root - [INFO] - 	!!!Scores: {'accuracy': 0.67, 'average': 0.67}
2024-05-01 12:46:59,380 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 12:46:59,380 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_0/evaluation_runs.json
2024-05-01 12:46:59,380 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:46:59,388 - root - [INFO] - 		Loading Full Data for wic
2024-05-01 12:47:00,113 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-8f888d9273347dcb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 12:47:00,167 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 12:47:01,668 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:0	Num Templates:10	Num Examples with Template:606
2024-05-01 12:47:07,001 - root - [INFO] - 	!!!Scores: {'accuracy': 0.629, 'average': 0.629}
2024-05-01 12:47:07,001 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 12:47:07,001 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_1/evaluation_runs.json
2024-05-01 12:47:07,002 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:47:07,010 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 12:47:08,504 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:1	Num Templates:10	Num Examples with Template:606
2024-05-01 12:47:13,688 - root - [INFO] - 	!!!Scores: {'accuracy': 0.64, 'average': 0.64}
2024-05-01 12:47:13,688 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 12:47:13,688 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_2/evaluation_runs.json
2024-05-01 12:47:13,689 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:47:13,696 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 12:47:15,194 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:2	Num Templates:10	Num Examples with Template:606
2024-05-01 12:47:20,898 - root - [INFO] - 	!!!Scores: {'accuracy': 0.635, 'average': 0.635}
2024-05-01 12:47:20,898 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 12:47:20,899 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_3/evaluation_runs.json
2024-05-01 12:47:20,899 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:47:20,906 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 12:47:22,397 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:3	Num Templates:10	Num Examples with Template:606
2024-05-01 12:47:28,225 - root - [INFO] - 	!!!Scores: {'accuracy': 0.538, 'average': 0.538}
2024-05-01 12:47:28,225 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 12:47:28,225 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_4/evaluation_runs.json
2024-05-01 12:47:28,225 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:47:28,233 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 12:47:29,714 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:4	Num Templates:10	Num Examples with Template:606
2024-05-01 12:47:35,126 - root - [INFO] - 	!!!Scores: {'accuracy': 0.569, 'average': 0.569}
2024-05-01 12:47:35,126 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 12:47:35,126 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_5/evaluation_runs.json
2024-05-01 12:47:35,126 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:47:35,134 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 12:47:36,631 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:5	Num Templates:10	Num Examples with Template:606
2024-05-01 12:47:42,467 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 12:47:42,467 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 12:47:42,467 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_6/evaluation_runs.json
2024-05-01 12:47:42,467 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:47:42,475 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 12:47:43,965 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:6	Num Templates:10	Num Examples with Template:606
2024-05-01 12:47:49,381 - root - [INFO] - 	!!!Scores: {'accuracy': 0.647, 'average': 0.647}
2024-05-01 12:47:49,381 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 12:47:49,381 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_7/evaluation_runs.json
2024-05-01 12:47:49,381 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:47:49,390 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 12:47:50,867 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:7	Num Templates:10	Num Examples with Template:606
2024-05-01 12:47:56,433 - root - [INFO] - 	!!!Scores: {'accuracy': 0.52, 'average': 0.52}
2024-05-01 12:47:56,433 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 12:47:56,433 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_8/evaluation_runs.json
2024-05-01 12:47:56,433 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:47:56,442 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 12:47:57,939 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:8	Num Templates:10	Num Examples with Template:606
2024-05-01 12:48:04,022 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 12:48:04,022 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 12:48:04,022 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_9/evaluation_runs.json
2024-05-01 12:48:04,022 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:48:04,030 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 12:48:05,505 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:9	Num Templates:10	Num Examples with Template:606
2024-05-01 12:48:10,234 - root - [INFO] - 	!!!Scores: {'accuracy': 0.632, 'average': 0.632}
2024-05-01 12:48:10,234 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 12:48:10,234 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_0/evaluation_runs.json
2024-05-01 12:48:10,234 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:48:10,242 - root - [INFO] - 		Loading Full Data for wsc
2024-05-01 12:48:10,967 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-4bcfdb1f33f6e278/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 12:48:10,984 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 12:48:11,173 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:0	Num Templates:10	Num Examples with Template:72
2024-05-01 12:48:12,962 - root - [INFO] - 	!!!Scores: {'accuracy': 0.583, 'average': 0.583}
2024-05-01 12:48:12,962 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 12:48:12,962 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_1/evaluation_runs.json
2024-05-01 12:48:12,962 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:48:12,970 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 12:48:13,142 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:1	Num Templates:10	Num Examples with Template:72
2024-05-01 12:48:14,903 - root - [INFO] - 	!!!Scores: {'accuracy': 0.528, 'average': 0.528}
2024-05-01 12:48:14,903 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 12:48:14,903 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_2/evaluation_runs.json
2024-05-01 12:48:14,903 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:48:14,911 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 12:48:15,128 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:2	Num Templates:10	Num Examples with Template:72
2024-05-01 12:48:17,052 - root - [INFO] - 	!!!Scores: {'accuracy': 0.708, 'average': 0.708}
2024-05-01 12:48:17,052 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 12:48:17,052 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_3/evaluation_runs.json
2024-05-01 12:48:17,052 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:48:17,060 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 12:48:17,276 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:3	Num Templates:10	Num Examples with Template:72
2024-05-01 12:48:19,205 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 12:48:19,205 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 12:48:19,205 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_4/evaluation_runs.json
2024-05-01 12:48:19,205 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:48:19,213 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 12:48:19,395 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:4	Num Templates:10	Num Examples with Template:72
2024-05-01 12:48:21,145 - root - [INFO] - 	!!!Scores: {'accuracy': 0.569, 'average': 0.569}
2024-05-01 12:48:21,145 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 12:48:21,145 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_5/evaluation_runs.json
2024-05-01 12:48:21,145 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:48:21,153 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 12:48:21,328 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:5	Num Templates:10	Num Examples with Template:72
2024-05-01 12:48:23,128 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 12:48:23,128 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 12:48:23,128 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_6/evaluation_runs.json
2024-05-01 12:48:23,128 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:48:23,136 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 12:48:23,308 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:6	Num Templates:10	Num Examples with Template:72
2024-05-01 12:48:25,118 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 12:48:25,118 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 12:48:25,118 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_7/evaluation_runs.json
2024-05-01 12:48:25,118 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:48:25,126 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 12:48:25,407 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:7	Num Templates:10	Num Examples with Template:72
2024-05-01 12:48:27,175 - root - [INFO] - 	!!!Scores: {'accuracy': 0.542, 'average': 0.542}
2024-05-01 12:48:27,175 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 12:48:27,176 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_8/evaluation_runs.json
2024-05-01 12:48:27,176 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:48:27,183 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 12:48:27,356 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:8	Num Templates:10	Num Examples with Template:72
2024-05-01 12:48:29,152 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 12:48:29,152 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 12:48:29,152 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_9/evaluation_runs.json
2024-05-01 12:48:29,152 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:48:29,160 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 12:48:29,449 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:9	Num Templates:10	Num Examples with Template:72
2024-05-01 12:48:31,209 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 12:48:31,210 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 12:48:31,210 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_0/evaluation_runs.json
2024-05-01 12:48:31,210 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:48:31,218 - root - [INFO] - 		Loading Full Data for copa
2024-05-01 12:48:31,932 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-75ee279101665e7b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 12:48:31,947 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 12:48:32,164 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:0	Num Templates:8	Num Examples with Template:68
2024-05-01 12:48:33,749 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 12:48:33,749 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 12:48:33,749 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_1/evaluation_runs.json
2024-05-01 12:48:33,749 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:48:33,757 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 12:48:33,965 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:1	Num Templates:8	Num Examples with Template:68
2024-05-01 12:48:35,581 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 12:48:35,581 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 12:48:35,581 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_2/evaluation_runs.json
2024-05-01 12:48:35,582 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:48:35,589 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 12:48:35,798 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:2	Num Templates:8	Num Examples with Template:68
2024-05-01 12:48:37,392 - root - [INFO] - 	!!!Scores: {'accuracy': 0.926, 'average': 0.926}
2024-05-01 12:48:37,392 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 12:48:37,392 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_3/evaluation_runs.json
2024-05-01 12:48:37,392 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:48:37,400 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 12:48:37,622 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:3	Num Templates:8	Num Examples with Template:68
2024-05-01 12:48:39,154 - root - [INFO] - 	!!!Scores: {'accuracy': 0.824, 'average': 0.824}
2024-05-01 12:48:39,154 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 12:48:39,154 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_4/evaluation_runs.json
2024-05-01 12:48:39,154 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:48:39,162 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 12:48:39,369 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:4	Num Templates:8	Num Examples with Template:68
2024-05-01 12:48:40,969 - root - [INFO] - 	!!!Scores: {'accuracy': 0.882, 'average': 0.882}
2024-05-01 12:48:40,969 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 12:48:40,969 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_5/evaluation_runs.json
2024-05-01 12:48:40,969 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:48:40,977 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 12:48:41,187 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:5	Num Templates:8	Num Examples with Template:68
2024-05-01 12:48:42,793 - root - [INFO] - 	!!!Scores: {'accuracy': 0.882, 'average': 0.882}
2024-05-01 12:48:42,793 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 12:48:42,793 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_6/evaluation_runs.json
2024-05-01 12:48:42,793 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:48:42,801 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 12:48:43,009 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:6	Num Templates:8	Num Examples with Template:68
2024-05-01 12:48:44,569 - root - [INFO] - 	!!!Scores: {'accuracy': 0.853, 'average': 0.853}
2024-05-01 12:48:44,569 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 12:48:44,569 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_7/evaluation_runs.json
2024-05-01 12:48:44,569 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:48:44,577 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 12:48:44,785 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:7	Num Templates:8	Num Examples with Template:68
2024-05-01 12:48:46,340 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 12:48:46,340 - root - [INFO] - 	Evaluating model on h-swag dataset
2024-05-01 12:48:46,340 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/h-swag_template_0/evaluation_runs.json
2024-05-01 12:48:46,340 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:48:46,353 - root - [INFO] - 		Loading Full Data for h-swag
2024-05-01 12:48:47,068 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-eac19eb96b26d7b6/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 12:48:47,759 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Selected Examples: 10010	Num Total Example:10010
2024-05-01 12:49:09,798 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Num Selected Example with Templates:10010	Template Idx:0	Num Templates:1	Num Examples with Template:10010
2024-05-01 12:54:23,120 - root - [INFO] - 	!!!Scores: {'accuracy': 0.428, 'average': 0.428}
2024-05-01 12:54:23,120 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 12:54:23,120 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_0/evaluation_runs.json
2024-05-01 12:54:23,121 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:54:23,843 - datasets.builder - [WARNING] - Found cached dataset arrow (/home/guodong/.cache/huggingface/datasets/arrow/default-2796e0ea53bfdf30/0.0.0/74f69db2c14c2860059d39860b1f400a03d11bf7fb5a8258ca38c501c878c137)
2024-05-01 12:54:23,956 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 12:54:29,956 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:0	Num Templates:5	Num Examples with Template:1839
2024-05-01 12:54:54,276 - root - [INFO] - 	!!!Scores: {'accuracy': 0.927, 'average': 0.927}
2024-05-01 12:54:54,276 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 12:54:54,276 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_1/evaluation_runs.json
2024-05-01 12:54:54,276 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:54:54,285 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 12:55:00,360 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:1	Num Templates:5	Num Examples with Template:1839
2024-05-01 12:55:25,102 - root - [INFO] - 	!!!Scores: {'accuracy': 0.928, 'average': 0.928}
2024-05-01 12:55:25,102 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 12:55:25,102 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_2/evaluation_runs.json
2024-05-01 12:55:25,102 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:55:25,111 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 12:55:31,118 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:2	Num Templates:5	Num Examples with Template:1839
2024-05-01 12:55:55,713 - root - [INFO] - 	!!!Scores: {'accuracy': 0.926, 'average': 0.926}
2024-05-01 12:55:55,714 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 12:55:55,714 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_3/evaluation_runs.json
2024-05-01 12:55:55,714 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:55:55,722 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 12:56:01,772 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:3	Num Templates:5	Num Examples with Template:1839
2024-05-01 12:56:26,388 - root - [INFO] - 	!!!Scores: {'accuracy': 0.924, 'average': 0.924}
2024-05-01 12:56:26,388 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 12:56:26,388 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_4/evaluation_runs.json
2024-05-01 12:56:26,389 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:56:26,398 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 12:56:32,427 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:4	Num Templates:5	Num Examples with Template:1839
2024-05-01 12:56:57,718 - root - [INFO] - 	!!!Scores: {'accuracy': 0.92, 'average': 0.92}
2024-05-01 12:56:57,718 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:56:57,718 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_0/evaluation_runs.json
2024-05-01 12:56:57,718 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:56:58,665 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-9d580cad42f0d988/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 12:56:58,719 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:57:00,522 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:57:19,000 - root - [INFO] - 	!!!Scores: {'accuracy': 0.624, 'average': 0.624}
2024-05-01 12:57:19,000 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:57:19,000 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_1/evaluation_runs.json
2024-05-01 12:57:19,001 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:57:19,010 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:57:20,851 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:57:37,395 - root - [INFO] - 	!!!Scores: {'accuracy': 0.675, 'average': 0.675}
2024-05-01 12:57:37,395 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:57:37,395 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_2/evaluation_runs.json
2024-05-01 12:57:37,396 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:57:37,404 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:57:39,220 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:57:55,481 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 12:57:55,481 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:57:55,481 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_3/evaluation_runs.json
2024-05-01 12:57:55,481 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:57:55,489 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:57:57,292 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:58:13,720 - root - [INFO] - 	!!!Scores: {'accuracy': 0.658, 'average': 0.658}
2024-05-01 12:58:13,721 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:58:13,721 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_4/evaluation_runs.json
2024-05-01 12:58:13,721 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:58:13,729 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:58:15,530 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:58:32,321 - root - [INFO] - 	!!!Scores: {'accuracy': 0.676, 'average': 0.676}
2024-05-01 12:58:32,322 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:58:32,322 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_5/evaluation_runs.json
2024-05-01 12:58:32,322 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:58:32,331 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:58:34,150 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:58:50,622 - root - [INFO] - 	!!!Scores: {'accuracy': 0.688, 'average': 0.688}
2024-05-01 12:58:50,622 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:58:50,622 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_6/evaluation_runs.json
2024-05-01 12:58:50,622 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:58:50,630 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:58:53,010 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:59:11,144 - root - [INFO] - 	!!!Scores: {'accuracy': 0.657, 'average': 0.657}
2024-05-01 12:59:11,144 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:59:11,144 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_7/evaluation_runs.json
2024-05-01 12:59:11,144 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:59:11,152 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:59:12,994 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:59:29,701 - root - [INFO] - 	!!!Scores: {'accuracy': 0.674, 'average': 0.674}
2024-05-01 12:59:29,701 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:59:29,701 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_8/evaluation_runs.json
2024-05-01 12:59:29,701 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:59:29,709 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:59:31,547 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 12:59:48,075 - root - [INFO] - 	!!!Scores: {'accuracy': 0.675, 'average': 0.675}
2024-05-01 12:59:48,075 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 12:59:48,075 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_9/evaluation_runs.json
2024-05-01 12:59:48,075 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 12:59:48,084 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 12:59:50,457 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:00:07,632 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 13:00:07,632 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:00:07,632 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_10/evaluation_runs.json
2024-05-01 13:00:07,632 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:00:07,640 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:00:10,005 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:00:26,854 - root - [INFO] - 	!!!Scores: {'accuracy': 0.647, 'average': 0.647}
2024-05-01 13:00:26,854 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:00:26,854 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_11/evaluation_runs.json
2024-05-01 13:00:26,854 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:00:26,862 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:00:28,661 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:00:45,194 - root - [INFO] - 	!!!Scores: {'accuracy': 0.676, 'average': 0.676}
2024-05-01 13:00:45,194 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:00:45,194 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_12/evaluation_runs.json
2024-05-01 13:00:45,194 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:00:45,202 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:00:47,555 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:01:04,439 - root - [INFO] - 	!!!Scores: {'accuracy': 0.634, 'average': 0.634}
2024-05-01 13:01:04,440 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:01:04,440 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_13/evaluation_runs.json
2024-05-01 13:01:04,440 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:01:04,449 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:01:06,792 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:01:24,528 - root - [INFO] - 	!!!Scores: {'accuracy': 0.663, 'average': 0.663}
2024-05-01 13:01:24,528 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:01:24,528 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_14/evaluation_runs.json
2024-05-01 13:01:24,528 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:01:24,536 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:01:26,373 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:01:42,899 - root - [INFO] - 	!!!Scores: {'accuracy': 0.676, 'average': 0.676}
2024-05-01 13:01:42,899 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:01:42,899 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_0/evaluation_runs.json
2024-05-01 13:01:42,900 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:01:43,669 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-2f5e309763d7971f/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 13:01:43,721 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:01:45,519 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:02:03,453 - root - [INFO] - 	!!!Scores: {'accuracy': 0.496, 'average': 0.496}
2024-05-01 13:02:03,453 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:02:03,453 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_1/evaluation_runs.json
2024-05-01 13:02:03,454 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:02:03,463 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:02:05,305 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:02:21,550 - root - [INFO] - 	!!!Scores: {'accuracy': 0.528, 'average': 0.528}
2024-05-01 13:02:21,551 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:02:21,551 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_2/evaluation_runs.json
2024-05-01 13:02:21,551 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:02:21,559 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:02:23,373 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:02:39,233 - root - [INFO] - 	!!!Scores: {'accuracy': 0.522, 'average': 0.522}
2024-05-01 13:02:39,234 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:02:39,234 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_3/evaluation_runs.json
2024-05-01 13:02:39,234 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:02:39,242 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:02:41,041 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:02:57,089 - root - [INFO] - 	!!!Scores: {'accuracy': 0.501, 'average': 0.501}
2024-05-01 13:02:57,089 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:02:57,089 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_4/evaluation_runs.json
2024-05-01 13:02:57,089 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:02:57,097 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:02:58,892 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:03:15,358 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 13:03:15,358 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:03:15,358 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_5/evaluation_runs.json
2024-05-01 13:03:15,359 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:03:15,368 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:03:17,184 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:03:33,299 - root - [INFO] - 	!!!Scores: {'accuracy': 0.52, 'average': 0.52}
2024-05-01 13:03:33,299 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:03:33,299 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_6/evaluation_runs.json
2024-05-01 13:03:33,299 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:03:33,307 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:03:35,882 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:03:53,529 - root - [INFO] - 	!!!Scores: {'accuracy': 0.513, 'average': 0.513}
2024-05-01 13:03:53,529 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:03:53,529 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_7/evaluation_runs.json
2024-05-01 13:03:53,529 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:03:53,537 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:03:55,389 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:04:11,780 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 13:04:11,780 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:04:11,780 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_8/evaluation_runs.json
2024-05-01 13:04:11,780 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:04:11,788 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:04:13,631 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:04:29,873 - root - [INFO] - 	!!!Scores: {'accuracy': 0.52, 'average': 0.52}
2024-05-01 13:04:29,873 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:04:29,873 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_9/evaluation_runs.json
2024-05-01 13:04:29,873 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:04:29,883 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:04:32,263 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:04:49,126 - root - [INFO] - 	!!!Scores: {'accuracy': 0.495, 'average': 0.495}
2024-05-01 13:04:49,127 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:04:49,127 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_10/evaluation_runs.json
2024-05-01 13:04:49,127 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:04:49,135 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:04:51,510 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:05:08,079 - root - [INFO] - 	!!!Scores: {'accuracy': 0.491, 'average': 0.491}
2024-05-01 13:05:08,079 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:05:08,079 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_11/evaluation_runs.json
2024-05-01 13:05:08,079 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:05:08,087 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:05:09,886 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:05:26,127 - root - [INFO] - 	!!!Scores: {'accuracy': 0.518, 'average': 0.518}
2024-05-01 13:05:26,127 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:05:26,127 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_12/evaluation_runs.json
2024-05-01 13:05:26,128 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:05:26,136 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:05:28,492 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:05:45,043 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 13:05:45,044 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:05:45,044 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_13/evaluation_runs.json
2024-05-01 13:05:45,044 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:05:45,053 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:05:47,400 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:06:04,775 - root - [INFO] - 	!!!Scores: {'accuracy': 0.503, 'average': 0.503}
2024-05-01 13:06:04,775 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:06:04,775 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_14/evaluation_runs.json
2024-05-01 13:06:04,775 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:06:04,783 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:06:06,623 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:06:22,862 - root - [INFO] - 	!!!Scores: {'accuracy': 0.517, 'average': 0.517}
2024-05-01 13:06:22,863 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:06:22,863 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_0/evaluation_runs.json
2024-05-01 13:06:22,863 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:06:23,611 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-5854c73e8e2a58bf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 13:06:23,670 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:06:25,829 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:0	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:06:51,119 - root - [INFO] - 	!!!Scores: {'accuracy': 0.488, 'average': 0.488}
2024-05-01 13:06:51,119 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:06:51,120 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_1/evaluation_runs.json
2024-05-01 13:06:51,120 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:06:51,129 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:06:53,337 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:1	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:07:16,192 - root - [INFO] - 	!!!Scores: {'accuracy': 0.505, 'average': 0.505}
2024-05-01 13:07:16,192 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:07:16,192 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_2/evaluation_runs.json
2024-05-01 13:07:16,192 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:07:16,200 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:07:18,376 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:2	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:07:40,847 - root - [INFO] - 	!!!Scores: {'accuracy': 0.485, 'average': 0.485}
2024-05-01 13:07:40,848 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:07:40,848 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_3/evaluation_runs.json
2024-05-01 13:07:40,848 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:07:40,856 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:07:43,015 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:3	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:08:05,701 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 13:08:05,701 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:08:05,701 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_4/evaluation_runs.json
2024-05-01 13:08:05,701 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:08:05,710 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:08:07,865 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:4	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:08:31,108 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 13:08:31,108 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:08:31,108 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_5/evaluation_runs.json
2024-05-01 13:08:31,108 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:08:31,116 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:08:33,293 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:5	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:08:56,047 - root - [INFO] - 	!!!Scores: {'accuracy': 0.504, 'average': 0.504}
2024-05-01 13:08:56,047 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:08:56,047 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_6/evaluation_runs.json
2024-05-01 13:08:56,047 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:08:56,055 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:08:58,903 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:6	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:09:23,790 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 13:09:23,790 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:09:23,791 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_7/evaluation_runs.json
2024-05-01 13:09:23,791 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:09:23,800 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:09:26,011 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:7	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:09:49,156 - root - [INFO] - 	!!!Scores: {'accuracy': 0.484, 'average': 0.484}
2024-05-01 13:09:49,156 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:09:49,156 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_8/evaluation_runs.json
2024-05-01 13:09:49,156 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:09:49,164 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:09:51,390 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:8	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:10:14,245 - root - [INFO] - 	!!!Scores: {'accuracy': 0.487, 'average': 0.487}
2024-05-01 13:10:14,245 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:10:14,245 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_9/evaluation_runs.json
2024-05-01 13:10:14,246 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:10:14,254 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:10:17,116 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:9	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:10:40,837 - root - [INFO] - 	!!!Scores: {'accuracy': 0.486, 'average': 0.486}
2024-05-01 13:10:40,838 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:10:40,838 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_10/evaluation_runs.json
2024-05-01 13:10:40,838 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:10:40,846 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:10:43,696 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:10	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:11:07,066 - root - [INFO] - 	!!!Scores: {'accuracy': 0.498, 'average': 0.498}
2024-05-01 13:11:07,066 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:11:07,066 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_11/evaluation_runs.json
2024-05-01 13:11:07,066 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:11:07,074 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:11:09,246 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:11	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:11:32,100 - root - [INFO] - 	!!!Scores: {'accuracy': 0.488, 'average': 0.488}
2024-05-01 13:11:32,101 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:11:32,101 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_12/evaluation_runs.json
2024-05-01 13:11:32,101 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:11:32,110 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:11:34,940 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:12	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:11:58,308 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 13:11:58,308 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:11:58,308 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_13/evaluation_runs.json
2024-05-01 13:11:58,308 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:11:58,316 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:12:01,128 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:13	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:12:25,534 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 13:12:25,534 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:12:25,534 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_14/evaluation_runs.json
2024-05-01 13:12:25,535 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:12:25,543 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:12:27,757 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:14	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:12:50,622 - root - [INFO] - 	!!!Scores: {'accuracy': 0.504, 'average': 0.504}
2024-05-01 13:12:50,702 - root - [INFO] - Unexpected keys: []
2024-05-01 13:12:50,940 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 13:12:50,940 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_0/evaluation_runs.json
2024-05-01 13:12:50,940 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:12:50,949 - root - [INFO] - 		Loading Full Data for rte
2024-05-01 13:12:51,915 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-667c257dd17a5fbb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 13:12:51,939 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 13:12:52,456 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:0	Num Templates:10	Num Examples with Template:245
2024-05-01 13:12:58,213 - root - [INFO] - 	!!!Scores: {'accuracy': 0.812, 'average': 0.812}
2024-05-01 13:12:58,213 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 13:12:58,214 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_1/evaluation_runs.json
2024-05-01 13:12:58,214 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:12:58,221 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 13:12:58,738 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:1	Num Templates:10	Num Examples with Template:245
2024-05-01 13:13:04,184 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 13:13:04,184 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 13:13:04,184 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_2/evaluation_runs.json
2024-05-01 13:13:04,184 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:13:04,192 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 13:13:04,709 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:2	Num Templates:10	Num Examples with Template:245
2024-05-01 13:13:10,156 - root - [INFO] - 	!!!Scores: {'accuracy': 0.816, 'average': 0.816}
2024-05-01 13:13:10,156 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 13:13:10,156 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_3/evaluation_runs.json
2024-05-01 13:13:10,156 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:13:10,164 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 13:13:10,676 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:3	Num Templates:10	Num Examples with Template:245
2024-05-01 13:13:16,051 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 13:13:16,051 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 13:13:16,051 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_4/evaluation_runs.json
2024-05-01 13:13:16,051 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:13:16,059 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 13:13:16,574 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:4	Num Templates:10	Num Examples with Template:245
2024-05-01 13:13:22,021 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 13:13:22,021 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 13:13:22,021 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_5/evaluation_runs.json
2024-05-01 13:13:22,021 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:13:22,029 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 13:13:22,545 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:5	Num Templates:10	Num Examples with Template:245
2024-05-01 13:13:27,992 - root - [INFO] - 	!!!Scores: {'accuracy': 0.8, 'average': 0.8}
2024-05-01 13:13:27,993 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 13:13:27,993 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_6/evaluation_runs.json
2024-05-01 13:13:27,993 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:13:28,000 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 13:13:28,518 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:6	Num Templates:10	Num Examples with Template:245
2024-05-01 13:13:33,902 - root - [INFO] - 	!!!Scores: {'accuracy': 0.796, 'average': 0.796}
2024-05-01 13:13:33,902 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 13:13:33,902 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_7/evaluation_runs.json
2024-05-01 13:13:33,902 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:13:33,910 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 13:13:34,422 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:7	Num Templates:10	Num Examples with Template:245
2024-05-01 13:13:40,001 - root - [INFO] - 	!!!Scores: {'accuracy': 0.788, 'average': 0.788}
2024-05-01 13:13:40,001 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 13:13:40,001 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_8/evaluation_runs.json
2024-05-01 13:13:40,001 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:13:40,009 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 13:13:40,521 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:8	Num Templates:10	Num Examples with Template:245
2024-05-01 13:13:45,964 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 13:13:45,964 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 13:13:45,964 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_9/evaluation_runs.json
2024-05-01 13:13:45,964 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:13:45,972 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 13:13:46,490 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:9	Num Templates:10	Num Examples with Template:245
2024-05-01 13:13:52,029 - root - [INFO] - 	!!!Scores: {'accuracy': 0.812, 'average': 0.812}
2024-05-01 13:13:52,029 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:13:52,029 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_0/evaluation_runs.json
2024-05-01 13:13:52,029 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:13:52,037 - root - [INFO] - 		Loading Full Data for cb
2024-05-01 13:13:53,030 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-c26e73467e3f215e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 13:13:53,040 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:13:53,104 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:0	Num Templates:15	Num Examples with Template:24
2024-05-01 13:13:54,546 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 13:13:54,546 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:13:54,546 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_1/evaluation_runs.json
2024-05-01 13:13:54,547 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:13:54,554 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:13:54,604 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:1	Num Templates:15	Num Examples with Template:24
2024-05-01 13:13:56,051 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 13:13:56,052 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:13:56,052 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_2/evaluation_runs.json
2024-05-01 13:13:56,052 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:13:56,059 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:13:56,123 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:2	Num Templates:15	Num Examples with Template:24
2024-05-01 13:13:57,589 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 13:13:57,589 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:13:57,589 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_3/evaluation_runs.json
2024-05-01 13:13:57,590 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:13:57,597 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:13:57,648 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:3	Num Templates:15	Num Examples with Template:24
2024-05-01 13:13:59,083 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 13:13:59,083 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:13:59,083 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_4/evaluation_runs.json
2024-05-01 13:13:59,083 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:13:59,090 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:13:59,141 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:4	Num Templates:15	Num Examples with Template:24
2024-05-01 13:14:00,582 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 13:14:00,582 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:14:00,582 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_5/evaluation_runs.json
2024-05-01 13:14:00,582 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:14:00,590 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:14:00,665 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:5	Num Templates:15	Num Examples with Template:24
2024-05-01 13:14:02,113 - root - [INFO] - 	!!!Scores: {'accuracy': 0.917, 'average': 0.917}
2024-05-01 13:14:02,114 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:14:02,114 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_6/evaluation_runs.json
2024-05-01 13:14:02,114 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:14:02,121 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:14:02,172 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:6	Num Templates:15	Num Examples with Template:24
2024-05-01 13:14:03,609 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 13:14:03,609 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:14:03,609 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_7/evaluation_runs.json
2024-05-01 13:14:03,609 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:14:03,616 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:14:03,680 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:7	Num Templates:15	Num Examples with Template:24
2024-05-01 13:14:05,130 - root - [INFO] - 	!!!Scores: {'accuracy': 0.75, 'average': 0.75}
2024-05-01 13:14:05,130 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:14:05,130 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_8/evaluation_runs.json
2024-05-01 13:14:05,130 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:14:05,138 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:14:05,189 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:8	Num Templates:15	Num Examples with Template:24
2024-05-01 13:14:06,631 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 13:14:06,631 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:14:06,631 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_9/evaluation_runs.json
2024-05-01 13:14:06,631 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:14:06,639 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:14:06,690 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:9	Num Templates:15	Num Examples with Template:24
2024-05-01 13:14:08,134 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 13:14:08,134 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:14:08,134 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_10/evaluation_runs.json
2024-05-01 13:14:08,134 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:14:08,141 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:14:08,205 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:10	Num Templates:15	Num Examples with Template:24
2024-05-01 13:14:09,661 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 13:14:09,661 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:14:09,661 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_11/evaluation_runs.json
2024-05-01 13:14:09,661 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:14:09,669 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:14:09,719 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:11	Num Templates:15	Num Examples with Template:24
2024-05-01 13:14:11,161 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 13:14:11,161 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:14:11,161 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_12/evaluation_runs.json
2024-05-01 13:14:11,161 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:14:11,169 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:14:11,220 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:12	Num Templates:15	Num Examples with Template:24
2024-05-01 13:14:12,707 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 13:14:12,707 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:14:12,707 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_13/evaluation_runs.json
2024-05-01 13:14:12,707 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:14:12,714 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:14:12,765 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:13	Num Templates:15	Num Examples with Template:24
2024-05-01 13:14:14,206 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 13:14:14,206 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:14:14,206 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_14/evaluation_runs.json
2024-05-01 13:14:14,206 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:14:14,213 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:14:14,278 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:14	Num Templates:15	Num Examples with Template:24
2024-05-01 13:14:15,757 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 13:14:15,757 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 13:14:15,757 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_0/evaluation_runs.json
2024-05-01 13:14:15,757 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:14:16,736 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-3ae7a9f52719d86a/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 13:14:16,792 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 13:14:20,401 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:0	Num Templates:5	Num Examples with Template:1235
2024-05-01 13:14:30,032 - root - [INFO] - 	!!!Scores: {'accuracy': 0.679, 'average': 0.679}
2024-05-01 13:14:30,032 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 13:14:30,033 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_1/evaluation_runs.json
2024-05-01 13:14:30,033 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:14:30,041 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 13:14:33,568 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:1	Num Templates:5	Num Examples with Template:1235
2024-05-01 13:14:43,283 - root - [INFO] - 	!!!Scores: {'accuracy': 0.672, 'average': 0.672}
2024-05-01 13:14:43,283 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 13:14:43,283 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_2/evaluation_runs.json
2024-05-01 13:14:43,284 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:14:43,292 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 13:14:46,889 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:2	Num Templates:5	Num Examples with Template:1235
2024-05-01 13:14:56,663 - root - [INFO] - 	!!!Scores: {'accuracy': 0.684, 'average': 0.684}
2024-05-01 13:14:56,663 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 13:14:56,663 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_3/evaluation_runs.json
2024-05-01 13:14:56,663 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:14:56,671 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 13:15:00,293 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:3	Num Templates:5	Num Examples with Template:1235
2024-05-01 13:15:10,388 - root - [INFO] - 	!!!Scores: {'accuracy': 0.674, 'average': 0.674}
2024-05-01 13:15:10,388 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 13:15:10,388 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_4/evaluation_runs.json
2024-05-01 13:15:10,388 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:15:10,396 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 13:15:13,988 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:4	Num Templates:5	Num Examples with Template:1235
2024-05-01 13:15:23,971 - root - [INFO] - 	!!!Scores: {'accuracy': 0.673, 'average': 0.673}
2024-05-01 13:15:23,971 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 13:15:23,971 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_0/evaluation_runs.json
2024-05-01 13:15:23,972 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:15:23,979 - root - [INFO] - 		Loading Full Data for wic
2024-05-01 13:15:24,998 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-8f888d9273347dcb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 13:15:25,050 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 13:15:26,535 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:0	Num Templates:10	Num Examples with Template:606
2024-05-01 13:15:31,866 - root - [INFO] - 	!!!Scores: {'accuracy': 0.655, 'average': 0.655}
2024-05-01 13:15:31,866 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 13:15:31,866 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_1/evaluation_runs.json
2024-05-01 13:15:31,867 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:15:31,874 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 13:15:33,360 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:1	Num Templates:10	Num Examples with Template:606
2024-05-01 13:15:38,547 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 13:15:38,547 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 13:15:38,547 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_2/evaluation_runs.json
2024-05-01 13:15:38,548 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:15:38,556 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 13:15:40,050 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:2	Num Templates:10	Num Examples with Template:606
2024-05-01 13:15:45,755 - root - [INFO] - 	!!!Scores: {'accuracy': 0.65, 'average': 0.65}
2024-05-01 13:15:45,755 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 13:15:45,755 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_3/evaluation_runs.json
2024-05-01 13:15:45,755 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:15:45,763 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 13:15:47,254 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:3	Num Templates:10	Num Examples with Template:606
2024-05-01 13:15:53,084 - root - [INFO] - 	!!!Scores: {'accuracy': 0.533, 'average': 0.533}
2024-05-01 13:15:53,084 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 13:15:53,084 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_4/evaluation_runs.json
2024-05-01 13:15:53,084 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:15:53,092 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 13:15:54,568 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:4	Num Templates:10	Num Examples with Template:606
2024-05-01 13:15:59,976 - root - [INFO] - 	!!!Scores: {'accuracy': 0.528, 'average': 0.528}
2024-05-01 13:15:59,977 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 13:15:59,977 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_5/evaluation_runs.json
2024-05-01 13:15:59,977 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:15:59,984 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 13:16:01,477 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:5	Num Templates:10	Num Examples with Template:606
2024-05-01 13:16:07,315 - root - [INFO] - 	!!!Scores: {'accuracy': 0.644, 'average': 0.644}
2024-05-01 13:16:07,315 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 13:16:07,315 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_6/evaluation_runs.json
2024-05-01 13:16:07,315 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:16:07,323 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 13:16:08,808 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:6	Num Templates:10	Num Examples with Template:606
2024-05-01 13:16:14,219 - root - [INFO] - 	!!!Scores: {'accuracy': 0.658, 'average': 0.658}
2024-05-01 13:16:14,219 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 13:16:14,219 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_7/evaluation_runs.json
2024-05-01 13:16:14,219 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:16:14,227 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 13:16:15,709 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:7	Num Templates:10	Num Examples with Template:606
2024-05-01 13:16:21,275 - root - [INFO] - 	!!!Scores: {'accuracy': 0.517, 'average': 0.517}
2024-05-01 13:16:21,275 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 13:16:21,275 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_8/evaluation_runs.json
2024-05-01 13:16:21,275 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:16:21,283 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 13:16:22,901 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:8	Num Templates:10	Num Examples with Template:606
2024-05-01 13:16:28,985 - root - [INFO] - 	!!!Scores: {'accuracy': 0.629, 'average': 0.629}
2024-05-01 13:16:28,985 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 13:16:28,985 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_9/evaluation_runs.json
2024-05-01 13:16:28,985 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:16:28,993 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 13:16:30,479 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:9	Num Templates:10	Num Examples with Template:606
2024-05-01 13:16:35,207 - root - [INFO] - 	!!!Scores: {'accuracy': 0.649, 'average': 0.649}
2024-05-01 13:16:35,207 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 13:16:35,207 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_0/evaluation_runs.json
2024-05-01 13:16:35,207 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:16:35,215 - root - [INFO] - 		Loading Full Data for wsc
2024-05-01 13:16:36,171 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-4bcfdb1f33f6e278/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 13:16:36,188 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 13:16:36,378 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:0	Num Templates:10	Num Examples with Template:72
2024-05-01 13:16:38,167 - root - [INFO] - 	!!!Scores: {'accuracy': 0.611, 'average': 0.611}
2024-05-01 13:16:38,167 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 13:16:38,167 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_1/evaluation_runs.json
2024-05-01 13:16:38,167 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:16:38,175 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 13:16:38,348 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:1	Num Templates:10	Num Examples with Template:72
2024-05-01 13:16:40,106 - root - [INFO] - 	!!!Scores: {'accuracy': 0.556, 'average': 0.556}
2024-05-01 13:16:40,106 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 13:16:40,106 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_2/evaluation_runs.json
2024-05-01 13:16:40,106 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:16:40,114 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 13:16:40,331 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:2	Num Templates:10	Num Examples with Template:72
2024-05-01 13:16:42,254 - root - [INFO] - 	!!!Scores: {'accuracy': 0.708, 'average': 0.708}
2024-05-01 13:16:42,254 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 13:16:42,254 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_3/evaluation_runs.json
2024-05-01 13:16:42,254 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:16:42,262 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 13:16:42,478 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:3	Num Templates:10	Num Examples with Template:72
2024-05-01 13:16:44,405 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 13:16:44,406 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 13:16:44,406 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_4/evaluation_runs.json
2024-05-01 13:16:44,406 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:16:44,413 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 13:16:44,595 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:4	Num Templates:10	Num Examples with Template:72
2024-05-01 13:16:46,344 - root - [INFO] - 	!!!Scores: {'accuracy': 0.569, 'average': 0.569}
2024-05-01 13:16:46,344 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 13:16:46,344 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_5/evaluation_runs.json
2024-05-01 13:16:46,344 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:16:46,352 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 13:16:46,527 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:5	Num Templates:10	Num Examples with Template:72
2024-05-01 13:16:48,326 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 13:16:48,327 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 13:16:48,327 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_6/evaluation_runs.json
2024-05-01 13:16:48,327 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:16:48,334 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 13:16:48,507 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:6	Num Templates:10	Num Examples with Template:72
2024-05-01 13:16:50,315 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 13:16:50,316 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 13:16:50,316 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_7/evaluation_runs.json
2024-05-01 13:16:50,316 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:16:50,323 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 13:16:50,597 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:7	Num Templates:10	Num Examples with Template:72
2024-05-01 13:16:52,365 - root - [INFO] - 	!!!Scores: {'accuracy': 0.528, 'average': 0.528}
2024-05-01 13:16:52,365 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 13:16:52,365 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_8/evaluation_runs.json
2024-05-01 13:16:52,365 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:16:52,373 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 13:16:52,546 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:8	Num Templates:10	Num Examples with Template:72
2024-05-01 13:16:54,339 - root - [INFO] - 	!!!Scores: {'accuracy': 0.597, 'average': 0.597}
2024-05-01 13:16:54,339 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 13:16:54,339 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_9/evaluation_runs.json
2024-05-01 13:16:54,340 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:16:54,347 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 13:16:54,636 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:9	Num Templates:10	Num Examples with Template:72
2024-05-01 13:16:56,396 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 13:16:56,396 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 13:16:56,396 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_0/evaluation_runs.json
2024-05-01 13:16:56,396 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:16:56,404 - root - [INFO] - 		Loading Full Data for copa
2024-05-01 13:16:57,376 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-75ee279101665e7b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 13:16:57,393 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 13:16:57,611 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:0	Num Templates:8	Num Examples with Template:68
2024-05-01 13:16:59,197 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 13:16:59,197 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 13:16:59,197 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_1/evaluation_runs.json
2024-05-01 13:16:59,197 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:16:59,204 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 13:16:59,412 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:1	Num Templates:8	Num Examples with Template:68
2024-05-01 13:17:01,029 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 13:17:01,030 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 13:17:01,030 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_2/evaluation_runs.json
2024-05-01 13:17:01,030 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:17:01,037 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 13:17:01,247 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:2	Num Templates:8	Num Examples with Template:68
2024-05-01 13:17:02,843 - root - [INFO] - 	!!!Scores: {'accuracy': 0.926, 'average': 0.926}
2024-05-01 13:17:02,844 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 13:17:02,844 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_3/evaluation_runs.json
2024-05-01 13:17:02,844 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:17:02,851 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 13:17:03,074 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:3	Num Templates:8	Num Examples with Template:68
2024-05-01 13:17:04,606 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 13:17:04,606 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 13:17:04,606 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_4/evaluation_runs.json
2024-05-01 13:17:04,606 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:17:04,613 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 13:17:04,821 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:4	Num Templates:8	Num Examples with Template:68
2024-05-01 13:17:06,419 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 13:17:06,419 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 13:17:06,419 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_5/evaluation_runs.json
2024-05-01 13:17:06,419 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:17:06,427 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 13:17:06,637 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:5	Num Templates:8	Num Examples with Template:68
2024-05-01 13:17:08,242 - root - [INFO] - 	!!!Scores: {'accuracy': 0.853, 'average': 0.853}
2024-05-01 13:17:08,242 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 13:17:08,242 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_6/evaluation_runs.json
2024-05-01 13:17:08,242 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:17:08,250 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 13:17:08,457 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:6	Num Templates:8	Num Examples with Template:68
2024-05-01 13:17:10,017 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 13:17:10,017 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 13:17:10,017 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_7/evaluation_runs.json
2024-05-01 13:17:10,017 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:17:10,025 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 13:17:10,233 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:7	Num Templates:8	Num Examples with Template:68
2024-05-01 13:17:11,785 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 13:17:11,785 - root - [INFO] - 	Evaluating model on h-swag dataset
2024-05-01 13:17:11,785 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/h-swag_template_0/evaluation_runs.json
2024-05-01 13:17:11,786 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:17:11,799 - root - [INFO] - 		Loading Full Data for h-swag
2024-05-01 13:17:12,954 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-eac19eb96b26d7b6/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 13:17:13,640 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Selected Examples: 10010	Num Total Example:10010
2024-05-01 13:17:35,364 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Num Selected Example with Templates:10010	Template Idx:0	Num Templates:1	Num Examples with Template:10010
2024-05-01 13:22:48,525 - root - [INFO] - 	!!!Scores: {'accuracy': 0.427, 'average': 0.427}
2024-05-01 13:22:48,525 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 13:22:48,525 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_0/evaluation_runs.json
2024-05-01 13:22:48,525 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:22:49,506 - datasets.builder - [WARNING] - Found cached dataset arrow (/home/guodong/.cache/huggingface/datasets/arrow/default-2796e0ea53bfdf30/0.0.0/74f69db2c14c2860059d39860b1f400a03d11bf7fb5a8258ca38c501c878c137)
2024-05-01 13:22:49,618 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 13:22:55,568 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:0	Num Templates:5	Num Examples with Template:1839
2024-05-01 13:23:19,869 - root - [INFO] - 	!!!Scores: {'accuracy': 0.933, 'average': 0.933}
2024-05-01 13:23:19,869 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 13:23:19,869 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_1/evaluation_runs.json
2024-05-01 13:23:19,869 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:23:19,877 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 13:23:25,908 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:1	Num Templates:5	Num Examples with Template:1839
2024-05-01 13:23:50,673 - root - [INFO] - 	!!!Scores: {'accuracy': 0.931, 'average': 0.931}
2024-05-01 13:23:50,673 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 13:23:50,673 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_2/evaluation_runs.json
2024-05-01 13:23:50,673 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:23:50,682 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 13:23:56,685 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:2	Num Templates:5	Num Examples with Template:1839
2024-05-01 13:24:21,277 - root - [INFO] - 	!!!Scores: {'accuracy': 0.933, 'average': 0.933}
2024-05-01 13:24:21,277 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 13:24:21,277 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_3/evaluation_runs.json
2024-05-01 13:24:21,278 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:24:21,286 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 13:24:27,330 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:3	Num Templates:5	Num Examples with Template:1839
2024-05-01 13:24:51,939 - root - [INFO] - 	!!!Scores: {'accuracy': 0.93, 'average': 0.93}
2024-05-01 13:24:51,940 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 13:24:51,940 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_4/evaluation_runs.json
2024-05-01 13:24:51,940 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:24:51,948 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 13:24:58,146 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:4	Num Templates:5	Num Examples with Template:1839
2024-05-01 13:25:23,446 - root - [INFO] - 	!!!Scores: {'accuracy': 0.929, 'average': 0.929}
2024-05-01 13:25:23,446 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:25:23,446 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_0/evaluation_runs.json
2024-05-01 13:25:23,446 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:25:24,422 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-9d580cad42f0d988/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 13:25:24,475 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:25:26,283 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:25:44,759 - root - [INFO] - 	!!!Scores: {'accuracy': 0.627, 'average': 0.627}
2024-05-01 13:25:44,759 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:25:44,759 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_1/evaluation_runs.json
2024-05-01 13:25:44,759 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:25:44,768 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:25:46,614 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:26:03,155 - root - [INFO] - 	!!!Scores: {'accuracy': 0.677, 'average': 0.677}
2024-05-01 13:26:03,155 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:26:03,155 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_2/evaluation_runs.json
2024-05-01 13:26:03,155 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:26:03,163 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:26:04,979 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:26:21,239 - root - [INFO] - 	!!!Scores: {'accuracy': 0.687, 'average': 0.687}
2024-05-01 13:26:21,239 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:26:21,239 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_3/evaluation_runs.json
2024-05-01 13:26:21,239 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:26:21,247 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:26:23,046 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:26:39,477 - root - [INFO] - 	!!!Scores: {'accuracy': 0.668, 'average': 0.668}
2024-05-01 13:26:39,478 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:26:39,478 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_4/evaluation_runs.json
2024-05-01 13:26:39,478 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:26:39,486 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:26:41,284 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:26:58,076 - root - [INFO] - 	!!!Scores: {'accuracy': 0.683, 'average': 0.683}
2024-05-01 13:26:58,076 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:26:58,076 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_5/evaluation_runs.json
2024-05-01 13:26:58,076 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:26:58,085 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:26:59,905 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:27:16,372 - root - [INFO] - 	!!!Scores: {'accuracy': 0.68, 'average': 0.68}
2024-05-01 13:27:16,372 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:27:16,373 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_6/evaluation_runs.json
2024-05-01 13:27:16,373 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:27:16,381 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:27:18,758 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:27:36,890 - root - [INFO] - 	!!!Scores: {'accuracy': 0.659, 'average': 0.659}
2024-05-01 13:27:36,890 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:27:36,891 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_7/evaluation_runs.json
2024-05-01 13:27:36,891 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:27:36,898 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:27:38,747 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:27:55,451 - root - [INFO] - 	!!!Scores: {'accuracy': 0.673, 'average': 0.673}
2024-05-01 13:27:55,452 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:27:55,452 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_8/evaluation_runs.json
2024-05-01 13:27:55,452 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:27:55,461 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:27:57,298 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:28:13,824 - root - [INFO] - 	!!!Scores: {'accuracy': 0.675, 'average': 0.675}
2024-05-01 13:28:13,824 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:28:13,824 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_9/evaluation_runs.json
2024-05-01 13:28:13,824 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:28:13,832 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:28:16,202 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:28:33,365 - root - [INFO] - 	!!!Scores: {'accuracy': 0.62, 'average': 0.62}
2024-05-01 13:28:33,365 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:28:33,365 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_10/evaluation_runs.json
2024-05-01 13:28:33,365 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:28:33,373 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:28:35,738 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:28:52,583 - root - [INFO] - 	!!!Scores: {'accuracy': 0.659, 'average': 0.659}
2024-05-01 13:28:52,583 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:28:52,583 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_11/evaluation_runs.json
2024-05-01 13:28:52,583 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:28:52,591 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:28:54,386 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:29:10,917 - root - [INFO] - 	!!!Scores: {'accuracy': 0.68, 'average': 0.68}
2024-05-01 13:29:10,918 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:29:10,918 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_12/evaluation_runs.json
2024-05-01 13:29:10,918 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:29:10,926 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:29:13,277 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:29:30,150 - root - [INFO] - 	!!!Scores: {'accuracy': 0.631, 'average': 0.631}
2024-05-01 13:29:30,151 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:29:30,151 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_13/evaluation_runs.json
2024-05-01 13:29:30,151 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:29:30,160 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:29:32,505 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:29:50,245 - root - [INFO] - 	!!!Scores: {'accuracy': 0.665, 'average': 0.665}
2024-05-01 13:29:50,245 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:29:50,245 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_14/evaluation_runs.json
2024-05-01 13:29:50,245 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:29:50,253 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:29:52,088 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:30:08,611 - root - [INFO] - 	!!!Scores: {'accuracy': 0.67, 'average': 0.67}
2024-05-01 13:30:08,612 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:30:08,612 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_0/evaluation_runs.json
2024-05-01 13:30:08,612 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:30:09,807 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-2f5e309763d7971f/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 13:30:09,858 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:30:11,658 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:30:29,573 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 13:30:29,573 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:30:29,573 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_1/evaluation_runs.json
2024-05-01 13:30:29,573 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:30:29,581 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:30:31,414 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:30:47,662 - root - [INFO] - 	!!!Scores: {'accuracy': 0.521, 'average': 0.521}
2024-05-01 13:30:47,662 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:30:47,663 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_2/evaluation_runs.json
2024-05-01 13:30:47,663 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:30:47,670 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:30:49,478 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:31:05,334 - root - [INFO] - 	!!!Scores: {'accuracy': 0.53, 'average': 0.53}
2024-05-01 13:31:05,334 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:31:05,334 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_3/evaluation_runs.json
2024-05-01 13:31:05,334 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:31:05,342 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:31:07,137 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:31:23,178 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 13:31:23,179 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:31:23,179 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_4/evaluation_runs.json
2024-05-01 13:31:23,179 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:31:23,188 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:31:24,979 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:31:41,437 - root - [INFO] - 	!!!Scores: {'accuracy': 0.516, 'average': 0.516}
2024-05-01 13:31:41,437 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:31:41,437 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_5/evaluation_runs.json
2024-05-01 13:31:41,437 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:31:41,445 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:31:43,257 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:31:59,366 - root - [INFO] - 	!!!Scores: {'accuracy': 0.52, 'average': 0.52}
2024-05-01 13:31:59,366 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:31:59,366 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_6/evaluation_runs.json
2024-05-01 13:31:59,366 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:31:59,374 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:32:01,740 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:32:19,388 - root - [INFO] - 	!!!Scores: {'accuracy': 0.507, 'average': 0.507}
2024-05-01 13:32:19,388 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:32:19,388 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_7/evaluation_runs.json
2024-05-01 13:32:19,388 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:32:19,396 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:32:21,229 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:32:37,624 - root - [INFO] - 	!!!Scores: {'accuracy': 0.518, 'average': 0.518}
2024-05-01 13:32:37,624 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:32:37,624 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_8/evaluation_runs.json
2024-05-01 13:32:37,624 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:32:37,632 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:32:39,468 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:32:55,709 - root - [INFO] - 	!!!Scores: {'accuracy': 0.515, 'average': 0.515}
2024-05-01 13:32:55,709 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:32:55,709 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_9/evaluation_runs.json
2024-05-01 13:32:55,710 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:32:55,718 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:32:58,089 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:33:14,946 - root - [INFO] - 	!!!Scores: {'accuracy': 0.49, 'average': 0.49}
2024-05-01 13:33:14,946 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:33:14,946 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_10/evaluation_runs.json
2024-05-01 13:33:14,946 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:33:14,955 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:33:17,314 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:33:33,888 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 13:33:33,888 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:33:33,888 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_11/evaluation_runs.json
2024-05-01 13:33:33,888 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:33:33,896 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:33:35,688 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:33:51,923 - root - [INFO] - 	!!!Scores: {'accuracy': 0.516, 'average': 0.516}
2024-05-01 13:33:51,923 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:33:51,923 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_12/evaluation_runs.json
2024-05-01 13:33:51,924 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:33:51,931 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:33:54,276 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:34:10,817 - root - [INFO] - 	!!!Scores: {'accuracy': 0.499, 'average': 0.499}
2024-05-01 13:34:10,817 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:34:10,817 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_13/evaluation_runs.json
2024-05-01 13:34:10,817 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:34:10,825 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:34:13,164 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:34:30,535 - root - [INFO] - 	!!!Scores: {'accuracy': 0.509, 'average': 0.509}
2024-05-01 13:34:30,535 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:34:30,535 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_14/evaluation_runs.json
2024-05-01 13:34:30,535 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:34:30,543 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:34:32,373 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:34:48,613 - root - [INFO] - 	!!!Scores: {'accuracy': 0.513, 'average': 0.513}
2024-05-01 13:34:48,613 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:34:48,613 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_0/evaluation_runs.json
2024-05-01 13:34:48,613 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:34:49,591 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-5854c73e8e2a58bf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 13:34:49,651 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:34:51,804 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:0	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:35:17,093 - root - [INFO] - 	!!!Scores: {'accuracy': 0.499, 'average': 0.499}
2024-05-01 13:35:17,093 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:35:17,094 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_1/evaluation_runs.json
2024-05-01 13:35:17,094 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:35:17,103 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:35:19,302 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:1	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:35:42,151 - root - [INFO] - 	!!!Scores: {'accuracy': 0.507, 'average': 0.507}
2024-05-01 13:35:42,151 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:35:42,151 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_2/evaluation_runs.json
2024-05-01 13:35:42,151 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:35:42,159 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:35:44,327 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:2	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:36:06,796 - root - [INFO] - 	!!!Scores: {'accuracy': 0.486, 'average': 0.486}
2024-05-01 13:36:06,796 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:36:06,796 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_3/evaluation_runs.json
2024-05-01 13:36:06,797 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:36:06,804 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:36:08,957 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:3	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:36:31,632 - root - [INFO] - 	!!!Scores: {'accuracy': 0.51, 'average': 0.51}
2024-05-01 13:36:31,633 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:36:31,633 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_4/evaluation_runs.json
2024-05-01 13:36:31,633 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:36:31,641 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:36:33,786 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:4	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:36:57,022 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 13:36:57,022 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:36:57,022 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_5/evaluation_runs.json
2024-05-01 13:36:57,022 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:36:57,031 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:36:59,203 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:5	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:37:21,948 - root - [INFO] - 	!!!Scores: {'accuracy': 0.501, 'average': 0.501}
2024-05-01 13:37:21,949 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:37:21,949 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_6/evaluation_runs.json
2024-05-01 13:37:21,949 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:37:21,957 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:37:24,792 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:6	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:37:49,655 - root - [INFO] - 	!!!Scores: {'accuracy': 0.491, 'average': 0.491}
2024-05-01 13:37:49,655 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:37:49,655 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_7/evaluation_runs.json
2024-05-01 13:37:49,655 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:37:49,663 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:37:51,859 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:7	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:38:15,001 - root - [INFO] - 	!!!Scores: {'accuracy': 0.488, 'average': 0.488}
2024-05-01 13:38:15,001 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:38:15,001 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_8/evaluation_runs.json
2024-05-01 13:38:15,002 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:38:15,009 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:38:17,417 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:8	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:38:40,261 - root - [INFO] - 	!!!Scores: {'accuracy': 0.492, 'average': 0.492}
2024-05-01 13:38:40,261 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:38:40,261 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_9/evaluation_runs.json
2024-05-01 13:38:40,261 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:38:40,270 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:38:43,134 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:9	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:39:06,852 - root - [INFO] - 	!!!Scores: {'accuracy': 0.487, 'average': 0.487}
2024-05-01 13:39:06,853 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:39:06,853 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_10/evaluation_runs.json
2024-05-01 13:39:06,853 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:39:06,861 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:39:09,692 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:10	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:39:33,049 - root - [INFO] - 	!!!Scores: {'accuracy': 0.501, 'average': 0.501}
2024-05-01 13:39:33,049 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:39:33,049 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_11/evaluation_runs.json
2024-05-01 13:39:33,050 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:39:33,057 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:39:35,214 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:11	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:39:58,052 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 13:39:58,053 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:39:58,053 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_12/evaluation_runs.json
2024-05-01 13:39:58,053 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:39:58,062 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:40:00,900 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:12	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:40:24,260 - root - [INFO] - 	!!!Scores: {'accuracy': 0.516, 'average': 0.516}
2024-05-01 13:40:24,260 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:40:24,260 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_13/evaluation_runs.json
2024-05-01 13:40:24,260 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:40:24,268 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:40:27,082 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:13	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:40:51,474 - root - [INFO] - 	!!!Scores: {'accuracy': 0.507, 'average': 0.507}
2024-05-01 13:40:51,474 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 13:40:51,474 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_14/evaluation_runs.json
2024-05-01 13:40:51,474 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:40:51,482 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 13:40:53,697 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:14	Num Templates:15	Num Examples with Template:1200
2024-05-01 13:41:16,542 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 13:41:16,619 - root - [INFO] - Unexpected keys: []
2024-05-01 13:41:16,850 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 13:41:16,850 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_0/evaluation_runs.json
2024-05-01 13:41:16,850 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:41:16,859 - root - [INFO] - 		Loading Full Data for rte
2024-05-01 13:41:17,589 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-667c257dd17a5fbb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 13:41:17,613 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 13:41:18,129 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:0	Num Templates:10	Num Examples with Template:245
2024-05-01 13:41:23,882 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 13:41:23,882 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 13:41:23,882 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_1/evaluation_runs.json
2024-05-01 13:41:23,883 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:41:23,890 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 13:41:24,405 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:1	Num Templates:10	Num Examples with Template:245
2024-05-01 13:41:29,848 - root - [INFO] - 	!!!Scores: {'accuracy': 0.796, 'average': 0.796}
2024-05-01 13:41:29,849 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 13:41:29,849 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_2/evaluation_runs.json
2024-05-01 13:41:29,849 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:41:29,856 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 13:41:30,371 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:2	Num Templates:10	Num Examples with Template:245
2024-05-01 13:41:35,817 - root - [INFO] - 	!!!Scores: {'accuracy': 0.812, 'average': 0.812}
2024-05-01 13:41:35,817 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 13:41:35,817 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_3/evaluation_runs.json
2024-05-01 13:41:35,817 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:41:35,825 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 13:41:36,336 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:3	Num Templates:10	Num Examples with Template:245
2024-05-01 13:41:41,709 - root - [INFO] - 	!!!Scores: {'accuracy': 0.784, 'average': 0.784}
2024-05-01 13:41:41,709 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 13:41:41,709 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_4/evaluation_runs.json
2024-05-01 13:41:41,709 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:41:41,716 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 13:41:42,228 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:4	Num Templates:10	Num Examples with Template:245
2024-05-01 13:41:47,674 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 13:41:47,674 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 13:41:47,674 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_5/evaluation_runs.json
2024-05-01 13:41:47,674 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:41:47,682 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 13:41:48,197 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:5	Num Templates:10	Num Examples with Template:245
2024-05-01 13:41:53,644 - root - [INFO] - 	!!!Scores: {'accuracy': 0.796, 'average': 0.796}
2024-05-01 13:41:53,644 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 13:41:53,644 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_6/evaluation_runs.json
2024-05-01 13:41:53,644 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:41:53,652 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 13:41:54,168 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:6	Num Templates:10	Num Examples with Template:245
2024-05-01 13:41:59,549 - root - [INFO] - 	!!!Scores: {'accuracy': 0.776, 'average': 0.776}
2024-05-01 13:41:59,549 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 13:41:59,549 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_7/evaluation_runs.json
2024-05-01 13:41:59,550 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:41:59,557 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 13:42:00,067 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:7	Num Templates:10	Num Examples with Template:245
2024-05-01 13:42:05,651 - root - [INFO] - 	!!!Scores: {'accuracy': 0.776, 'average': 0.776}
2024-05-01 13:42:05,651 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 13:42:05,651 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_8/evaluation_runs.json
2024-05-01 13:42:05,651 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:42:05,659 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 13:42:06,181 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:8	Num Templates:10	Num Examples with Template:245
2024-05-01 13:42:11,628 - root - [INFO] - 	!!!Scores: {'accuracy': 0.796, 'average': 0.796}
2024-05-01 13:42:11,628 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 13:42:11,628 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_9/evaluation_runs.json
2024-05-01 13:42:11,628 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:42:11,636 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 13:42:12,156 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:9	Num Templates:10	Num Examples with Template:245
2024-05-01 13:42:17,695 - root - [INFO] - 	!!!Scores: {'accuracy': 0.796, 'average': 0.796}
2024-05-01 13:42:17,695 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:42:17,695 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_0/evaluation_runs.json
2024-05-01 13:42:17,695 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:42:17,703 - root - [INFO] - 		Loading Full Data for cb
2024-05-01 13:42:18,651 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-c26e73467e3f215e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 13:42:18,659 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:42:18,724 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:0	Num Templates:15	Num Examples with Template:24
2024-05-01 13:42:20,168 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 13:42:20,168 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:42:20,168 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_1/evaluation_runs.json
2024-05-01 13:42:20,168 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:42:20,175 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:42:20,226 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:1	Num Templates:15	Num Examples with Template:24
2024-05-01 13:42:21,673 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 13:42:21,673 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:42:21,673 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_2/evaluation_runs.json
2024-05-01 13:42:21,673 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:42:21,681 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:42:21,745 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:2	Num Templates:15	Num Examples with Template:24
2024-05-01 13:42:23,212 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 13:42:23,212 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:42:23,212 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_3/evaluation_runs.json
2024-05-01 13:42:23,212 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:42:23,219 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:42:23,270 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:3	Num Templates:15	Num Examples with Template:24
2024-05-01 13:42:24,704 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 13:42:24,704 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:42:24,704 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_4/evaluation_runs.json
2024-05-01 13:42:24,704 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:42:24,712 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:42:24,762 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:4	Num Templates:15	Num Examples with Template:24
2024-05-01 13:42:26,201 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 13:42:26,201 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:42:26,201 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_5/evaluation_runs.json
2024-05-01 13:42:26,202 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:42:26,209 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:42:26,273 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:5	Num Templates:15	Num Examples with Template:24
2024-05-01 13:42:27,721 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 13:42:27,722 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:42:27,722 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_6/evaluation_runs.json
2024-05-01 13:42:27,722 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:42:27,729 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:42:27,780 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:6	Num Templates:15	Num Examples with Template:24
2024-05-01 13:42:29,217 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 13:42:29,217 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:42:29,217 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_7/evaluation_runs.json
2024-05-01 13:42:29,218 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:42:29,225 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:42:29,289 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:7	Num Templates:15	Num Examples with Template:24
2024-05-01 13:42:30,740 - root - [INFO] - 	!!!Scores: {'accuracy': 0.75, 'average': 0.75}
2024-05-01 13:42:30,740 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:42:30,740 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_8/evaluation_runs.json
2024-05-01 13:42:30,740 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:42:30,748 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:42:30,799 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:8	Num Templates:15	Num Examples with Template:24
2024-05-01 13:42:32,243 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 13:42:32,243 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:42:32,243 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_9/evaluation_runs.json
2024-05-01 13:42:32,243 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:42:32,251 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:42:32,302 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:9	Num Templates:15	Num Examples with Template:24
2024-05-01 13:42:33,746 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 13:42:33,746 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:42:33,746 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_10/evaluation_runs.json
2024-05-01 13:42:33,746 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:42:33,754 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:42:33,818 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:10	Num Templates:15	Num Examples with Template:24
2024-05-01 13:42:35,274 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 13:42:35,275 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:42:35,275 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_11/evaluation_runs.json
2024-05-01 13:42:35,275 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:42:35,282 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:42:35,333 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:11	Num Templates:15	Num Examples with Template:24
2024-05-01 13:42:36,775 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 13:42:36,775 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:42:36,775 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_12/evaluation_runs.json
2024-05-01 13:42:36,775 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:42:36,782 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:42:36,834 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:12	Num Templates:15	Num Examples with Template:24
2024-05-01 13:42:38,321 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 13:42:38,321 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:42:38,321 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_13/evaluation_runs.json
2024-05-01 13:42:38,321 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:42:38,328 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:42:38,379 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:13	Num Templates:15	Num Examples with Template:24
2024-05-01 13:42:39,822 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 13:42:39,822 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 13:42:39,822 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_14/evaluation_runs.json
2024-05-01 13:42:39,822 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:42:39,829 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 13:42:39,894 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:14	Num Templates:15	Num Examples with Template:24
2024-05-01 13:42:41,372 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 13:42:41,372 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 13:42:41,372 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_0/evaluation_runs.json
2024-05-01 13:42:41,372 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:42:42,577 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-3ae7a9f52719d86a/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 13:42:42,633 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 13:42:46,265 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:0	Num Templates:5	Num Examples with Template:1235
2024-05-01 13:42:55,895 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 13:42:55,895 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 13:42:55,895 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_1/evaluation_runs.json
2024-05-01 13:42:55,895 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:42:55,903 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 13:42:59,421 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:1	Num Templates:5	Num Examples with Template:1235
2024-05-01 13:43:09,128 - root - [INFO] - 	!!!Scores: {'accuracy': 0.67, 'average': 0.67}
2024-05-01 13:43:09,128 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 13:43:09,128 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_2/evaluation_runs.json
2024-05-01 13:43:09,128 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:43:09,137 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 13:43:12,721 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:2	Num Templates:5	Num Examples with Template:1235
2024-05-01 13:43:22,499 - root - [INFO] - 	!!!Scores: {'accuracy': 0.678, 'average': 0.678}
2024-05-01 13:43:22,500 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 13:43:22,500 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_3/evaluation_runs.json
2024-05-01 13:43:22,500 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:43:22,508 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 13:43:26,128 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:3	Num Templates:5	Num Examples with Template:1235
2024-05-01 13:43:36,228 - root - [INFO] - 	!!!Scores: {'accuracy': 0.674, 'average': 0.674}
2024-05-01 13:43:36,228 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 13:43:36,228 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_4/evaluation_runs.json
2024-05-01 13:43:36,228 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:43:36,236 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 13:43:39,843 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:4	Num Templates:5	Num Examples with Template:1235
2024-05-01 13:43:49,822 - root - [INFO] - 	!!!Scores: {'accuracy': 0.67, 'average': 0.67}
2024-05-01 13:43:49,822 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 13:43:49,822 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_0/evaluation_runs.json
2024-05-01 13:43:49,822 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:43:49,830 - root - [INFO] - 		Loading Full Data for wic
2024-05-01 13:43:50,791 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-8f888d9273347dcb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 13:43:50,844 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 13:43:52,330 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:0	Num Templates:10	Num Examples with Template:606
2024-05-01 13:43:57,650 - root - [INFO] - 	!!!Scores: {'accuracy': 0.647, 'average': 0.647}
2024-05-01 13:43:57,651 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 13:43:57,651 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_1/evaluation_runs.json
2024-05-01 13:43:57,651 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:43:57,658 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 13:43:59,149 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:1	Num Templates:10	Num Examples with Template:606
2024-05-01 13:44:04,332 - root - [INFO] - 	!!!Scores: {'accuracy': 0.65, 'average': 0.65}
2024-05-01 13:44:04,332 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 13:44:04,332 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_2/evaluation_runs.json
2024-05-01 13:44:04,332 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:44:04,340 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 13:44:05,834 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:2	Num Templates:10	Num Examples with Template:606
2024-05-01 13:44:11,533 - root - [INFO] - 	!!!Scores: {'accuracy': 0.644, 'average': 0.644}
2024-05-01 13:44:11,533 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 13:44:11,533 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_3/evaluation_runs.json
2024-05-01 13:44:11,533 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:44:11,541 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 13:44:13,030 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:3	Num Templates:10	Num Examples with Template:606
2024-05-01 13:44:18,856 - root - [INFO] - 	!!!Scores: {'accuracy': 0.538, 'average': 0.538}
2024-05-01 13:44:18,856 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 13:44:18,856 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_4/evaluation_runs.json
2024-05-01 13:44:18,856 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:44:18,863 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 13:44:20,340 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:4	Num Templates:10	Num Examples with Template:606
2024-05-01 13:44:25,746 - root - [INFO] - 	!!!Scores: {'accuracy': 0.541, 'average': 0.541}
2024-05-01 13:44:25,746 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 13:44:25,747 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_5/evaluation_runs.json
2024-05-01 13:44:25,747 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:44:25,754 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 13:44:27,248 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:5	Num Templates:10	Num Examples with Template:606
2024-05-01 13:44:33,089 - root - [INFO] - 	!!!Scores: {'accuracy': 0.649, 'average': 0.649}
2024-05-01 13:44:33,089 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 13:44:33,089 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_6/evaluation_runs.json
2024-05-01 13:44:33,089 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:44:33,097 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 13:44:34,587 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:6	Num Templates:10	Num Examples with Template:606
2024-05-01 13:44:39,995 - root - [INFO] - 	!!!Scores: {'accuracy': 0.665, 'average': 0.665}
2024-05-01 13:44:39,995 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 13:44:39,995 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_7/evaluation_runs.json
2024-05-01 13:44:39,996 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:44:40,003 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 13:44:41,480 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:7	Num Templates:10	Num Examples with Template:606
2024-05-01 13:44:47,046 - root - [INFO] - 	!!!Scores: {'accuracy': 0.518, 'average': 0.518}
2024-05-01 13:44:47,047 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 13:44:47,047 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_8/evaluation_runs.json
2024-05-01 13:44:47,047 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:44:47,055 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 13:44:48,548 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:8	Num Templates:10	Num Examples with Template:606
2024-05-01 13:44:54,624 - root - [INFO] - 	!!!Scores: {'accuracy': 0.627, 'average': 0.627}
2024-05-01 13:44:54,625 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 13:44:54,625 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_9/evaluation_runs.json
2024-05-01 13:44:54,625 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:44:54,633 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 13:44:56,104 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:9	Num Templates:10	Num Examples with Template:606
2024-05-01 13:45:00,834 - root - [INFO] - 	!!!Scores: {'accuracy': 0.645, 'average': 0.645}
2024-05-01 13:45:00,834 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 13:45:00,834 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_0/evaluation_runs.json
2024-05-01 13:45:00,835 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:45:00,842 - root - [INFO] - 		Loading Full Data for wsc
2024-05-01 13:45:01,798 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-4bcfdb1f33f6e278/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 13:45:01,816 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 13:45:02,005 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:0	Num Templates:10	Num Examples with Template:72
2024-05-01 13:45:03,794 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 13:45:03,794 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 13:45:03,794 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_1/evaluation_runs.json
2024-05-01 13:45:03,794 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:45:03,802 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 13:45:03,974 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:1	Num Templates:10	Num Examples with Template:72
2024-05-01 13:45:05,733 - root - [INFO] - 	!!!Scores: {'accuracy': 0.556, 'average': 0.556}
2024-05-01 13:45:05,733 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 13:45:05,733 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_2/evaluation_runs.json
2024-05-01 13:45:05,733 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:45:05,741 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 13:45:05,957 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:2	Num Templates:10	Num Examples with Template:72
2024-05-01 13:45:07,880 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 13:45:07,880 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 13:45:07,880 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_3/evaluation_runs.json
2024-05-01 13:45:07,880 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:45:07,888 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 13:45:08,102 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:3	Num Templates:10	Num Examples with Template:72
2024-05-01 13:45:10,029 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 13:45:10,029 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 13:45:10,029 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_4/evaluation_runs.json
2024-05-01 13:45:10,029 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:45:10,037 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 13:45:10,218 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:4	Num Templates:10	Num Examples with Template:72
2024-05-01 13:45:11,969 - root - [INFO] - 	!!!Scores: {'accuracy': 0.556, 'average': 0.556}
2024-05-01 13:45:11,969 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 13:45:11,969 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_5/evaluation_runs.json
2024-05-01 13:45:11,969 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:45:11,976 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 13:45:12,151 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:5	Num Templates:10	Num Examples with Template:72
2024-05-01 13:45:13,949 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 13:45:13,949 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 13:45:13,949 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_6/evaluation_runs.json
2024-05-01 13:45:13,950 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:45:13,957 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 13:45:14,129 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:6	Num Templates:10	Num Examples with Template:72
2024-05-01 13:45:15,938 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 13:45:15,938 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 13:45:15,938 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_7/evaluation_runs.json
2024-05-01 13:45:15,938 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:45:15,946 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 13:45:16,218 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:7	Num Templates:10	Num Examples with Template:72
2024-05-01 13:45:17,986 - root - [INFO] - 	!!!Scores: {'accuracy': 0.514, 'average': 0.514}
2024-05-01 13:45:17,986 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 13:45:17,986 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_8/evaluation_runs.json
2024-05-01 13:45:17,986 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:45:17,994 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 13:45:18,166 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:8	Num Templates:10	Num Examples with Template:72
2024-05-01 13:45:19,961 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 13:45:19,961 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 13:45:19,961 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_9/evaluation_runs.json
2024-05-01 13:45:19,961 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:45:19,968 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 13:45:20,256 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:9	Num Templates:10	Num Examples with Template:72
2024-05-01 13:45:22,015 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 13:45:22,015 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 13:45:22,015 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_0/evaluation_runs.json
2024-05-01 13:45:22,015 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:45:22,022 - root - [INFO] - 		Loading Full Data for copa
2024-05-01 13:45:22,966 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-75ee279101665e7b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 13:45:22,982 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 13:45:23,200 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:0	Num Templates:8	Num Examples with Template:68
2024-05-01 13:45:24,784 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 13:45:24,785 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 13:45:24,785 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_1/evaluation_runs.json
2024-05-01 13:45:24,785 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:45:24,792 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 13:45:24,999 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:1	Num Templates:8	Num Examples with Template:68
2024-05-01 13:45:26,616 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 13:45:26,616 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 13:45:26,616 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_2/evaluation_runs.json
2024-05-01 13:45:26,617 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:45:26,624 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 13:45:26,832 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:2	Num Templates:8	Num Examples with Template:68
2024-05-01 13:45:28,427 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 13:45:28,427 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 13:45:28,427 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_3/evaluation_runs.json
2024-05-01 13:45:28,427 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:45:28,434 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 13:45:28,656 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:3	Num Templates:8	Num Examples with Template:68
2024-05-01 13:45:30,189 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 13:45:30,189 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 13:45:30,189 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_4/evaluation_runs.json
2024-05-01 13:45:30,189 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:45:30,197 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 13:45:30,404 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:4	Num Templates:8	Num Examples with Template:68
2024-05-01 13:45:32,002 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 13:45:32,003 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 13:45:32,003 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_5/evaluation_runs.json
2024-05-01 13:45:32,003 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:45:32,010 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 13:45:32,220 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:5	Num Templates:8	Num Examples with Template:68
2024-05-01 13:45:33,824 - root - [INFO] - 	!!!Scores: {'accuracy': 0.868, 'average': 0.868}
2024-05-01 13:45:33,825 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 13:45:33,825 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_6/evaluation_runs.json
2024-05-01 13:45:33,825 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:45:33,832 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 13:45:34,040 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:6	Num Templates:8	Num Examples with Template:68
2024-05-01 13:45:35,600 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 13:45:35,600 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 13:45:35,600 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_7/evaluation_runs.json
2024-05-01 13:45:35,600 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:45:35,608 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 13:45:35,815 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:7	Num Templates:8	Num Examples with Template:68
2024-05-01 13:45:37,368 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 13:45:37,368 - root - [INFO] - 	Evaluating model on h-swag dataset
2024-05-01 13:45:37,369 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/h-swag_template_0/evaluation_runs.json
2024-05-01 13:45:37,369 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:45:37,381 - root - [INFO] - 		Loading Full Data for h-swag
2024-05-01 13:45:38,315 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-eac19eb96b26d7b6/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 13:45:39,003 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Selected Examples: 10010	Num Total Example:10010
2024-05-01 13:46:00,696 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Num Selected Example with Templates:10010	Template Idx:0	Num Templates:1	Num Examples with Template:10010
2024-05-01 13:51:13,831 - root - [INFO] - 	!!!Scores: {'accuracy': 0.426, 'average': 0.426}
2024-05-01 13:51:13,832 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 13:51:13,832 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_0/evaluation_runs.json
2024-05-01 13:51:13,832 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:51:14,824 - datasets.builder - [WARNING] - Found cached dataset arrow (/home/guodong/.cache/huggingface/datasets/arrow/default-2796e0ea53bfdf30/0.0.0/74f69db2c14c2860059d39860b1f400a03d11bf7fb5a8258ca38c501c878c137)
2024-05-01 13:51:14,935 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 13:51:21,107 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:0	Num Templates:5	Num Examples with Template:1839
2024-05-01 13:51:45,410 - root - [INFO] - 	!!!Scores: {'accuracy': 0.921, 'average': 0.921}
2024-05-01 13:51:45,410 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 13:51:45,410 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_1/evaluation_runs.json
2024-05-01 13:51:45,410 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:51:45,419 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 13:51:51,517 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:1	Num Templates:5	Num Examples with Template:1839
2024-05-01 13:52:16,261 - root - [INFO] - 	!!!Scores: {'accuracy': 0.921, 'average': 0.921}
2024-05-01 13:52:16,261 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 13:52:16,261 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_2/evaluation_runs.json
2024-05-01 13:52:16,262 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:52:16,270 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 13:52:22,316 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:2	Num Templates:5	Num Examples with Template:1839
2024-05-01 13:52:46,894 - root - [INFO] - 	!!!Scores: {'accuracy': 0.927, 'average': 0.927}
2024-05-01 13:52:46,894 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 13:52:46,894 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_3/evaluation_runs.json
2024-05-01 13:52:46,894 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:52:46,902 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 13:52:52,952 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:3	Num Templates:5	Num Examples with Template:1839
2024-05-01 13:53:17,551 - root - [INFO] - 	!!!Scores: {'accuracy': 0.917, 'average': 0.917}
2024-05-01 13:53:17,551 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 13:53:17,551 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_4/evaluation_runs.json
2024-05-01 13:53:17,551 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:53:17,561 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 13:53:23,610 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:4	Num Templates:5	Num Examples with Template:1839
2024-05-01 13:53:48,879 - root - [INFO] - 	!!!Scores: {'accuracy': 0.916, 'average': 0.916}
2024-05-01 13:53:48,879 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:53:48,879 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_0/evaluation_runs.json
2024-05-01 13:53:48,879 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:53:49,631 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-9d580cad42f0d988/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 13:53:49,685 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:53:51,488 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:54:09,951 - root - [INFO] - 	!!!Scores: {'accuracy': 0.64, 'average': 0.64}
2024-05-01 13:54:09,951 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:54:09,951 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_1/evaluation_runs.json
2024-05-01 13:54:09,951 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:54:09,959 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:54:11,803 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:54:28,330 - root - [INFO] - 	!!!Scores: {'accuracy': 0.673, 'average': 0.673}
2024-05-01 13:54:28,330 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:54:28,330 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_2/evaluation_runs.json
2024-05-01 13:54:28,330 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:54:28,338 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:54:30,152 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:54:46,397 - root - [INFO] - 	!!!Scores: {'accuracy': 0.68, 'average': 0.68}
2024-05-01 13:54:46,397 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:54:46,397 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_3/evaluation_runs.json
2024-05-01 13:54:46,397 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:54:46,406 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:54:48,211 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:55:04,626 - root - [INFO] - 	!!!Scores: {'accuracy': 0.667, 'average': 0.667}
2024-05-01 13:55:04,626 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:55:04,626 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_4/evaluation_runs.json
2024-05-01 13:55:04,627 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:55:04,635 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:55:06,462 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:55:23,230 - root - [INFO] - 	!!!Scores: {'accuracy': 0.676, 'average': 0.676}
2024-05-01 13:55:23,230 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:55:23,230 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_5/evaluation_runs.json
2024-05-01 13:55:23,230 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:55:23,238 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:55:25,051 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:55:41,504 - root - [INFO] - 	!!!Scores: {'accuracy': 0.674, 'average': 0.674}
2024-05-01 13:55:41,504 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:55:41,505 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_6/evaluation_runs.json
2024-05-01 13:55:41,505 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:55:41,513 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:55:43,883 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:56:02,005 - root - [INFO] - 	!!!Scores: {'accuracy': 0.667, 'average': 0.667}
2024-05-01 13:56:02,005 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:56:02,005 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_7/evaluation_runs.json
2024-05-01 13:56:02,005 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:56:02,013 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:56:03,859 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:56:20,552 - root - [INFO] - 	!!!Scores: {'accuracy': 0.673, 'average': 0.673}
2024-05-01 13:56:20,553 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:56:20,553 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_8/evaluation_runs.json
2024-05-01 13:56:20,553 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:56:20,561 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:56:22,395 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:56:38,916 - root - [INFO] - 	!!!Scores: {'accuracy': 0.677, 'average': 0.677}
2024-05-01 13:56:38,916 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:56:38,916 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_9/evaluation_runs.json
2024-05-01 13:56:38,916 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:56:38,925 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:56:41,325 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:56:58,485 - root - [INFO] - 	!!!Scores: {'accuracy': 0.638, 'average': 0.638}
2024-05-01 13:56:58,485 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:56:58,485 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_10/evaluation_runs.json
2024-05-01 13:56:58,485 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:56:58,493 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:57:00,856 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:57:17,697 - root - [INFO] - 	!!!Scores: {'accuracy': 0.661, 'average': 0.661}
2024-05-01 13:57:17,697 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:57:17,697 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_11/evaluation_runs.json
2024-05-01 13:57:17,697 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:57:17,705 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:57:19,502 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:57:36,026 - root - [INFO] - 	!!!Scores: {'accuracy': 0.669, 'average': 0.669}
2024-05-01 13:57:36,026 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:57:36,026 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_12/evaluation_runs.json
2024-05-01 13:57:36,026 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:57:36,034 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:57:38,384 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:57:55,251 - root - [INFO] - 	!!!Scores: {'accuracy': 0.638, 'average': 0.638}
2024-05-01 13:57:55,251 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:57:55,251 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_13/evaluation_runs.json
2024-05-01 13:57:55,251 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:57:55,259 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:57:57,612 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:58:15,350 - root - [INFO] - 	!!!Scores: {'accuracy': 0.659, 'average': 0.659}
2024-05-01 13:58:15,350 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 13:58:15,350 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_14/evaluation_runs.json
2024-05-01 13:58:15,350 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:58:15,358 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:58:17,192 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:58:33,717 - root - [INFO] - 	!!!Scores: {'accuracy': 0.67, 'average': 0.67}
2024-05-01 13:58:33,717 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:58:33,717 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_0/evaluation_runs.json
2024-05-01 13:58:33,718 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:58:34,681 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-2f5e309763d7971f/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 13:58:34,735 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:58:36,534 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:58:54,456 - root - [INFO] - 	!!!Scores: {'accuracy': 0.499, 'average': 0.499}
2024-05-01 13:58:54,457 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:58:54,457 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_1/evaluation_runs.json
2024-05-01 13:58:54,457 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:58:54,465 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:58:56,303 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:59:12,540 - root - [INFO] - 	!!!Scores: {'accuracy': 0.515, 'average': 0.515}
2024-05-01 13:59:12,540 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:59:12,540 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_2/evaluation_runs.json
2024-05-01 13:59:12,540 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:59:12,548 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:59:14,371 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:59:30,215 - root - [INFO] - 	!!!Scores: {'accuracy': 0.518, 'average': 0.518}
2024-05-01 13:59:30,215 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:59:30,215 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_3/evaluation_runs.json
2024-05-01 13:59:30,215 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:59:30,223 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:59:32,022 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 13:59:48,059 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 13:59:48,059 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 13:59:48,059 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_4/evaluation_runs.json
2024-05-01 13:59:48,059 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 13:59:48,067 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 13:59:49,860 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:00:06,339 - root - [INFO] - 	!!!Scores: {'accuracy': 0.513, 'average': 0.513}
2024-05-01 14:00:06,339 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:00:06,339 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_5/evaluation_runs.json
2024-05-01 14:00:06,340 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:00:06,348 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:00:08,169 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:00:24,286 - root - [INFO] - 	!!!Scores: {'accuracy': 0.515, 'average': 0.515}
2024-05-01 14:00:24,286 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:00:24,286 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_6/evaluation_runs.json
2024-05-01 14:00:24,286 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:00:24,294 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:00:26,686 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:00:44,336 - root - [INFO] - 	!!!Scores: {'accuracy': 0.518, 'average': 0.518}
2024-05-01 14:00:44,337 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:00:44,337 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_7/evaluation_runs.json
2024-05-01 14:00:44,337 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:00:44,346 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:00:46,211 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:01:02,623 - root - [INFO] - 	!!!Scores: {'accuracy': 0.52, 'average': 0.52}
2024-05-01 14:01:02,623 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:01:02,623 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_8/evaluation_runs.json
2024-05-01 14:01:02,623 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:01:02,631 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:01:04,473 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:01:20,720 - root - [INFO] - 	!!!Scores: {'accuracy': 0.52, 'average': 0.52}
2024-05-01 14:01:20,720 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:01:20,720 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_9/evaluation_runs.json
2024-05-01 14:01:20,720 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:01:20,728 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:01:23,110 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:01:39,972 - root - [INFO] - 	!!!Scores: {'accuracy': 0.505, 'average': 0.505}
2024-05-01 14:01:39,972 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:01:39,972 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_10/evaluation_runs.json
2024-05-01 14:01:39,972 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:01:39,980 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:01:42,371 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:01:58,951 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 14:01:58,951 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:01:58,951 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_11/evaluation_runs.json
2024-05-01 14:01:58,952 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:01:58,964 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:02:00,773 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:02:17,039 - root - [INFO] - 	!!!Scores: {'accuracy': 0.522, 'average': 0.522}
2024-05-01 14:02:17,040 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:02:17,040 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_12/evaluation_runs.json
2024-05-01 14:02:17,040 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:02:17,048 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:02:19,416 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:02:35,968 - root - [INFO] - 	!!!Scores: {'accuracy': 0.503, 'average': 0.503}
2024-05-01 14:02:35,968 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:02:35,968 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_13/evaluation_runs.json
2024-05-01 14:02:35,968 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:02:35,977 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:02:38,542 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:02:55,936 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 14:02:55,936 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:02:55,937 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_14/evaluation_runs.json
2024-05-01 14:02:55,937 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:02:55,945 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:02:57,816 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:03:14,065 - root - [INFO] - 	!!!Scores: {'accuracy': 0.518, 'average': 0.518}
2024-05-01 14:03:14,065 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:03:14,065 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_0/evaluation_runs.json
2024-05-01 14:03:14,065 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:03:15,080 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-5854c73e8e2a58bf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 14:03:15,141 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:03:17,342 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:0	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:03:42,623 - root - [INFO] - 	!!!Scores: {'accuracy': 0.501, 'average': 0.501}
2024-05-01 14:03:42,623 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:03:42,624 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_1/evaluation_runs.json
2024-05-01 14:03:42,624 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:03:42,631 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:03:44,834 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:1	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:04:07,686 - root - [INFO] - 	!!!Scores: {'accuracy': 0.504, 'average': 0.504}
2024-05-01 14:04:07,686 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:04:07,686 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_2/evaluation_runs.json
2024-05-01 14:04:07,686 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:04:07,695 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:04:09,866 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:2	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:04:32,330 - root - [INFO] - 	!!!Scores: {'accuracy': 0.485, 'average': 0.485}
2024-05-01 14:04:32,331 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:04:32,331 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_3/evaluation_runs.json
2024-05-01 14:04:32,331 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:04:32,339 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:04:34,490 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:3	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:04:57,149 - root - [INFO] - 	!!!Scores: {'accuracy': 0.51, 'average': 0.51}
2024-05-01 14:04:57,150 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:04:57,150 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_4/evaluation_runs.json
2024-05-01 14:04:57,150 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:04:57,158 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:04:59,309 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:4	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:05:22,545 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 14:05:22,546 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:05:22,546 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_5/evaluation_runs.json
2024-05-01 14:05:22,546 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:05:22,555 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:05:24,730 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:5	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:05:47,473 - root - [INFO] - 	!!!Scores: {'accuracy': 0.506, 'average': 0.506}
2024-05-01 14:05:47,473 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:05:47,473 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_6/evaluation_runs.json
2024-05-01 14:05:47,473 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:05:47,481 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:05:50,320 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:6	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:06:15,172 - root - [INFO] - 	!!!Scores: {'accuracy': 0.494, 'average': 0.494}
2024-05-01 14:06:15,173 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:06:15,173 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_7/evaluation_runs.json
2024-05-01 14:06:15,173 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:06:15,180 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:06:17,382 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:7	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:06:40,516 - root - [INFO] - 	!!!Scores: {'accuracy': 0.485, 'average': 0.485}
2024-05-01 14:06:40,516 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:06:40,516 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_8/evaluation_runs.json
2024-05-01 14:06:40,516 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:06:40,524 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:06:42,723 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:8	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:07:05,555 - root - [INFO] - 	!!!Scores: {'accuracy': 0.485, 'average': 0.485}
2024-05-01 14:07:05,555 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:07:05,555 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_9/evaluation_runs.json
2024-05-01 14:07:05,555 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:07:05,564 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:07:08,404 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:9	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:07:32,119 - root - [INFO] - 	!!!Scores: {'accuracy': 0.491, 'average': 0.491}
2024-05-01 14:07:32,120 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:07:32,120 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_10/evaluation_runs.json
2024-05-01 14:07:32,120 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:07:32,128 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:07:34,962 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:10	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:07:58,321 - root - [INFO] - 	!!!Scores: {'accuracy': 0.499, 'average': 0.499}
2024-05-01 14:07:58,321 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:07:58,321 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_11/evaluation_runs.json
2024-05-01 14:07:58,321 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:07:58,329 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:08:00,482 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:11	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:08:23,330 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 14:08:23,330 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:08:23,330 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_12/evaluation_runs.json
2024-05-01 14:08:23,330 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:08:23,338 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:08:26,155 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:12	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:08:49,520 - root - [INFO] - 	!!!Scores: {'accuracy': 0.511, 'average': 0.511}
2024-05-01 14:08:49,520 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:08:49,520 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_13/evaluation_runs.json
2024-05-01 14:08:49,520 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:08:49,529 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:08:52,335 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:13	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:09:16,727 - root - [INFO] - 	!!!Scores: {'accuracy': 0.511, 'average': 0.511}
2024-05-01 14:09:16,727 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:09:16,727 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_14/evaluation_runs.json
2024-05-01 14:09:16,727 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:09:16,735 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:09:18,946 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:14	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:09:41,790 - root - [INFO] - 	!!!Scores: {'accuracy': 0.499, 'average': 0.499}
2024-05-01 14:09:41,866 - root - [INFO] - Unexpected keys: []
2024-05-01 14:09:42,096 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 14:09:42,096 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_0/evaluation_runs.json
2024-05-01 14:09:42,096 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:09:42,104 - root - [INFO] - 		Loading Full Data for rte
2024-05-01 14:09:43,072 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-667c257dd17a5fbb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 14:09:43,095 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 14:09:43,609 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:0	Num Templates:10	Num Examples with Template:245
2024-05-01 14:09:49,363 - root - [INFO] - 	!!!Scores: {'accuracy': 0.788, 'average': 0.788}
2024-05-01 14:09:49,363 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 14:09:49,363 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_1/evaluation_runs.json
2024-05-01 14:09:49,363 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:09:49,371 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 14:09:49,886 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:1	Num Templates:10	Num Examples with Template:245
2024-05-01 14:09:55,333 - root - [INFO] - 	!!!Scores: {'accuracy': 0.784, 'average': 0.784}
2024-05-01 14:09:55,333 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 14:09:55,333 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_2/evaluation_runs.json
2024-05-01 14:09:55,333 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:09:55,340 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 14:09:55,856 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:2	Num Templates:10	Num Examples with Template:245
2024-05-01 14:10:01,303 - root - [INFO] - 	!!!Scores: {'accuracy': 0.8, 'average': 0.8}
2024-05-01 14:10:01,303 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 14:10:01,303 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_3/evaluation_runs.json
2024-05-01 14:10:01,303 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:10:01,311 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 14:10:01,822 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:3	Num Templates:10	Num Examples with Template:245
2024-05-01 14:10:07,195 - root - [INFO] - 	!!!Scores: {'accuracy': 0.776, 'average': 0.776}
2024-05-01 14:10:07,195 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 14:10:07,195 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_4/evaluation_runs.json
2024-05-01 14:10:07,195 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:10:07,203 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 14:10:07,715 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:4	Num Templates:10	Num Examples with Template:245
2024-05-01 14:10:13,163 - root - [INFO] - 	!!!Scores: {'accuracy': 0.804, 'average': 0.804}
2024-05-01 14:10:13,163 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 14:10:13,163 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_5/evaluation_runs.json
2024-05-01 14:10:13,163 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:10:13,171 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 14:10:13,687 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:5	Num Templates:10	Num Examples with Template:245
2024-05-01 14:10:19,135 - root - [INFO] - 	!!!Scores: {'accuracy': 0.784, 'average': 0.784}
2024-05-01 14:10:19,135 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 14:10:19,135 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_6/evaluation_runs.json
2024-05-01 14:10:19,135 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:10:19,143 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 14:10:19,660 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:6	Num Templates:10	Num Examples with Template:245
2024-05-01 14:10:25,044 - root - [INFO] - 	!!!Scores: {'accuracy': 0.767, 'average': 0.767}
2024-05-01 14:10:25,044 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 14:10:25,044 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_7/evaluation_runs.json
2024-05-01 14:10:25,044 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:10:25,052 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 14:10:25,563 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:7	Num Templates:10	Num Examples with Template:245
2024-05-01 14:10:31,144 - root - [INFO] - 	!!!Scores: {'accuracy': 0.776, 'average': 0.776}
2024-05-01 14:10:31,144 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 14:10:31,144 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_8/evaluation_runs.json
2024-05-01 14:10:31,144 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:10:31,152 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 14:10:31,663 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:8	Num Templates:10	Num Examples with Template:245
2024-05-01 14:10:37,104 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 14:10:37,104 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 14:10:37,105 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_9/evaluation_runs.json
2024-05-01 14:10:37,105 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:10:37,112 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 14:10:37,631 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:9	Num Templates:10	Num Examples with Template:245
2024-05-01 14:10:43,172 - root - [INFO] - 	!!!Scores: {'accuracy': 0.78, 'average': 0.78}
2024-05-01 14:10:43,172 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:10:43,172 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_0/evaluation_runs.json
2024-05-01 14:10:43,172 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:10:43,180 - root - [INFO] - 		Loading Full Data for cb
2024-05-01 14:10:44,132 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-c26e73467e3f215e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 14:10:44,142 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:10:44,206 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:0	Num Templates:15	Num Examples with Template:24
2024-05-01 14:10:45,649 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 14:10:45,649 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:10:45,649 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_1/evaluation_runs.json
2024-05-01 14:10:45,650 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:10:45,657 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:10:45,708 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:1	Num Templates:15	Num Examples with Template:24
2024-05-01 14:10:47,154 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 14:10:47,155 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:10:47,155 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_2/evaluation_runs.json
2024-05-01 14:10:47,155 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:10:47,162 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:10:47,226 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:2	Num Templates:15	Num Examples with Template:24
2024-05-01 14:10:48,694 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 14:10:48,694 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:10:48,694 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_3/evaluation_runs.json
2024-05-01 14:10:48,695 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:10:48,702 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:10:48,753 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:3	Num Templates:15	Num Examples with Template:24
2024-05-01 14:10:50,188 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 14:10:50,189 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:10:50,189 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_4/evaluation_runs.json
2024-05-01 14:10:50,189 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:10:50,196 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:10:50,246 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:4	Num Templates:15	Num Examples with Template:24
2024-05-01 14:10:51,688 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 14:10:51,688 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:10:51,688 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_5/evaluation_runs.json
2024-05-01 14:10:51,688 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:10:51,696 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:10:51,760 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:5	Num Templates:15	Num Examples with Template:24
2024-05-01 14:10:53,210 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 14:10:53,210 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:10:53,210 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_6/evaluation_runs.json
2024-05-01 14:10:53,210 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:10:53,218 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:10:53,268 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:6	Num Templates:15	Num Examples with Template:24
2024-05-01 14:10:54,707 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 14:10:54,707 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:10:54,707 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_7/evaluation_runs.json
2024-05-01 14:10:54,707 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:10:54,714 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:10:54,778 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:7	Num Templates:15	Num Examples with Template:24
2024-05-01 14:10:56,227 - root - [INFO] - 	!!!Scores: {'accuracy': 0.75, 'average': 0.75}
2024-05-01 14:10:56,228 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:10:56,228 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_8/evaluation_runs.json
2024-05-01 14:10:56,228 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:10:56,235 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:10:56,286 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:8	Num Templates:15	Num Examples with Template:24
2024-05-01 14:10:57,729 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 14:10:57,729 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:10:57,729 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_9/evaluation_runs.json
2024-05-01 14:10:57,729 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:10:57,737 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:10:57,788 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:9	Num Templates:15	Num Examples with Template:24
2024-05-01 14:10:59,234 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 14:10:59,234 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:10:59,234 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_10/evaluation_runs.json
2024-05-01 14:10:59,235 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:10:59,242 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:10:59,307 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:10	Num Templates:15	Num Examples with Template:24
2024-05-01 14:11:00,763 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 14:11:00,763 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:11:00,763 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_11/evaluation_runs.json
2024-05-01 14:11:00,763 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:11:00,771 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:11:00,822 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:11	Num Templates:15	Num Examples with Template:24
2024-05-01 14:11:02,264 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 14:11:02,264 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:11:02,264 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_12/evaluation_runs.json
2024-05-01 14:11:02,264 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:11:02,272 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:11:02,323 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:12	Num Templates:15	Num Examples with Template:24
2024-05-01 14:11:03,810 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 14:11:03,810 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:11:03,810 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_13/evaluation_runs.json
2024-05-01 14:11:03,810 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:11:03,817 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:11:03,868 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:13	Num Templates:15	Num Examples with Template:24
2024-05-01 14:11:05,311 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 14:11:05,312 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:11:05,312 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_14/evaluation_runs.json
2024-05-01 14:11:05,312 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:11:05,319 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:11:05,383 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:14	Num Templates:15	Num Examples with Template:24
2024-05-01 14:11:06,861 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 14:11:06,861 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 14:11:06,861 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_0/evaluation_runs.json
2024-05-01 14:11:06,862 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:11:07,825 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-3ae7a9f52719d86a/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 14:11:07,882 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 14:11:11,475 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:0	Num Templates:5	Num Examples with Template:1235
2024-05-01 14:11:21,114 - root - [INFO] - 	!!!Scores: {'accuracy': 0.677, 'average': 0.677}
2024-05-01 14:11:21,115 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 14:11:21,115 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_1/evaluation_runs.json
2024-05-01 14:11:21,115 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:11:21,123 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 14:11:24,650 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:1	Num Templates:5	Num Examples with Template:1235
2024-05-01 14:11:34,357 - root - [INFO] - 	!!!Scores: {'accuracy': 0.669, 'average': 0.669}
2024-05-01 14:11:34,357 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 14:11:34,357 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_2/evaluation_runs.json
2024-05-01 14:11:34,358 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:11:34,366 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 14:11:37,960 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:2	Num Templates:5	Num Examples with Template:1235
2024-05-01 14:11:47,736 - root - [INFO] - 	!!!Scores: {'accuracy': 0.673, 'average': 0.673}
2024-05-01 14:11:47,736 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 14:11:47,736 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_3/evaluation_runs.json
2024-05-01 14:11:47,736 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:11:47,744 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 14:11:51,370 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:3	Num Templates:5	Num Examples with Template:1235
2024-05-01 14:12:01,471 - root - [INFO] - 	!!!Scores: {'accuracy': 0.67, 'average': 0.67}
2024-05-01 14:12:01,472 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 14:12:01,472 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_4/evaluation_runs.json
2024-05-01 14:12:01,472 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:12:01,480 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 14:12:05,074 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:4	Num Templates:5	Num Examples with Template:1235
2024-05-01 14:12:15,052 - root - [INFO] - 	!!!Scores: {'accuracy': 0.67, 'average': 0.67}
2024-05-01 14:12:15,052 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 14:12:15,052 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_0/evaluation_runs.json
2024-05-01 14:12:15,052 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:12:15,060 - root - [INFO] - 		Loading Full Data for wic
2024-05-01 14:12:16,071 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-8f888d9273347dcb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 14:12:16,125 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 14:12:17,610 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:0	Num Templates:10	Num Examples with Template:606
2024-05-01 14:12:22,930 - root - [INFO] - 	!!!Scores: {'accuracy': 0.652, 'average': 0.652}
2024-05-01 14:12:22,931 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 14:12:22,931 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_1/evaluation_runs.json
2024-05-01 14:12:22,931 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:12:22,938 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 14:12:24,425 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:1	Num Templates:10	Num Examples with Template:606
2024-05-01 14:12:29,610 - root - [INFO] - 	!!!Scores: {'accuracy': 0.647, 'average': 0.647}
2024-05-01 14:12:29,610 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 14:12:29,610 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_2/evaluation_runs.json
2024-05-01 14:12:29,610 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:12:29,618 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 14:12:31,112 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:2	Num Templates:10	Num Examples with Template:606
2024-05-01 14:12:36,814 - root - [INFO] - 	!!!Scores: {'accuracy': 0.65, 'average': 0.65}
2024-05-01 14:12:36,815 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 14:12:36,815 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_3/evaluation_runs.json
2024-05-01 14:12:36,815 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:12:36,823 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 14:12:38,312 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:3	Num Templates:10	Num Examples with Template:606
2024-05-01 14:12:44,136 - root - [INFO] - 	!!!Scores: {'accuracy': 0.523, 'average': 0.523}
2024-05-01 14:12:44,136 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 14:12:44,136 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_4/evaluation_runs.json
2024-05-01 14:12:44,136 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:12:44,144 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 14:12:45,621 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:4	Num Templates:10	Num Examples with Template:606
2024-05-01 14:12:51,032 - root - [INFO] - 	!!!Scores: {'accuracy': 0.531, 'average': 0.531}
2024-05-01 14:12:51,032 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 14:12:51,032 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_5/evaluation_runs.json
2024-05-01 14:12:51,032 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:12:51,040 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 14:12:52,534 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:5	Num Templates:10	Num Examples with Template:606
2024-05-01 14:12:58,368 - root - [INFO] - 	!!!Scores: {'accuracy': 0.632, 'average': 0.632}
2024-05-01 14:12:58,368 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 14:12:58,368 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_6/evaluation_runs.json
2024-05-01 14:12:58,368 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:12:58,376 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 14:12:59,863 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:6	Num Templates:10	Num Examples with Template:606
2024-05-01 14:13:05,280 - root - [INFO] - 	!!!Scores: {'accuracy': 0.655, 'average': 0.655}
2024-05-01 14:13:05,280 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 14:13:05,280 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_7/evaluation_runs.json
2024-05-01 14:13:05,280 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:13:05,288 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 14:13:06,765 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:7	Num Templates:10	Num Examples with Template:606
2024-05-01 14:13:12,329 - root - [INFO] - 	!!!Scores: {'accuracy': 0.507, 'average': 0.507}
2024-05-01 14:13:12,329 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 14:13:12,329 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_8/evaluation_runs.json
2024-05-01 14:13:12,330 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:13:12,337 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 14:13:13,832 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:8	Num Templates:10	Num Examples with Template:606
2024-05-01 14:13:19,910 - root - [INFO] - 	!!!Scores: {'accuracy': 0.632, 'average': 0.632}
2024-05-01 14:13:19,910 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 14:13:19,910 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_9/evaluation_runs.json
2024-05-01 14:13:19,910 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:13:19,918 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 14:13:21,390 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:9	Num Templates:10	Num Examples with Template:606
2024-05-01 14:13:26,116 - root - [INFO] - 	!!!Scores: {'accuracy': 0.649, 'average': 0.649}
2024-05-01 14:13:26,116 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 14:13:26,116 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_0/evaluation_runs.json
2024-05-01 14:13:26,116 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:13:26,124 - root - [INFO] - 		Loading Full Data for wsc
2024-05-01 14:13:27,091 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-4bcfdb1f33f6e278/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 14:13:27,107 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 14:13:27,296 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:0	Num Templates:10	Num Examples with Template:72
2024-05-01 14:13:29,087 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 14:13:29,087 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 14:13:29,087 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_1/evaluation_runs.json
2024-05-01 14:13:29,087 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:13:29,095 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 14:13:29,266 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:1	Num Templates:10	Num Examples with Template:72
2024-05-01 14:13:31,026 - root - [INFO] - 	!!!Scores: {'accuracy': 0.556, 'average': 0.556}
2024-05-01 14:13:31,027 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 14:13:31,027 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_2/evaluation_runs.json
2024-05-01 14:13:31,027 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:13:31,034 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 14:13:31,250 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:2	Num Templates:10	Num Examples with Template:72
2024-05-01 14:13:33,173 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 14:13:33,173 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 14:13:33,173 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_3/evaluation_runs.json
2024-05-01 14:13:33,174 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:13:33,181 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 14:13:33,396 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:3	Num Templates:10	Num Examples with Template:72
2024-05-01 14:13:35,323 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 14:13:35,323 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 14:13:35,323 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_4/evaluation_runs.json
2024-05-01 14:13:35,323 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:13:35,331 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 14:13:35,512 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:4	Num Templates:10	Num Examples with Template:72
2024-05-01 14:13:37,270 - root - [INFO] - 	!!!Scores: {'accuracy': 0.569, 'average': 0.569}
2024-05-01 14:13:37,270 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 14:13:37,270 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_5/evaluation_runs.json
2024-05-01 14:13:37,270 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:13:37,278 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 14:13:37,453 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:5	Num Templates:10	Num Examples with Template:72
2024-05-01 14:13:39,250 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 14:13:39,250 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 14:13:39,251 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_6/evaluation_runs.json
2024-05-01 14:13:39,251 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:13:39,258 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 14:13:39,430 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:6	Num Templates:10	Num Examples with Template:72
2024-05-01 14:13:41,238 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 14:13:41,238 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 14:13:41,238 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_7/evaluation_runs.json
2024-05-01 14:13:41,238 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:13:41,246 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 14:13:41,518 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:7	Num Templates:10	Num Examples with Template:72
2024-05-01 14:13:43,285 - root - [INFO] - 	!!!Scores: {'accuracy': 0.514, 'average': 0.514}
2024-05-01 14:13:43,285 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 14:13:43,285 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_8/evaluation_runs.json
2024-05-01 14:13:43,285 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:13:43,293 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 14:13:43,465 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:8	Num Templates:10	Num Examples with Template:72
2024-05-01 14:13:45,261 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 14:13:45,261 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 14:13:45,261 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_9/evaluation_runs.json
2024-05-01 14:13:45,261 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:13:45,268 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 14:13:45,556 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:9	Num Templates:10	Num Examples with Template:72
2024-05-01 14:13:47,318 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 14:13:47,318 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 14:13:47,318 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_0/evaluation_runs.json
2024-05-01 14:13:47,318 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:13:47,326 - root - [INFO] - 		Loading Full Data for copa
2024-05-01 14:13:48,333 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-75ee279101665e7b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 14:13:48,348 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 14:13:48,563 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:0	Num Templates:8	Num Examples with Template:68
2024-05-01 14:13:50,150 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 14:13:50,150 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 14:13:50,150 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_1/evaluation_runs.json
2024-05-01 14:13:50,150 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:13:50,158 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 14:13:50,366 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:1	Num Templates:8	Num Examples with Template:68
2024-05-01 14:13:51,984 - root - [INFO] - 	!!!Scores: {'accuracy': 0.824, 'average': 0.824}
2024-05-01 14:13:51,984 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 14:13:51,985 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_2/evaluation_runs.json
2024-05-01 14:13:51,985 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:13:51,992 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 14:13:52,200 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:2	Num Templates:8	Num Examples with Template:68
2024-05-01 14:13:53,797 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 14:13:53,797 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 14:13:53,797 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_3/evaluation_runs.json
2024-05-01 14:13:53,797 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:13:53,805 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 14:13:54,028 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:3	Num Templates:8	Num Examples with Template:68
2024-05-01 14:13:55,563 - root - [INFO] - 	!!!Scores: {'accuracy': 0.853, 'average': 0.853}
2024-05-01 14:13:55,563 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 14:13:55,563 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_4/evaluation_runs.json
2024-05-01 14:13:55,563 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:13:55,571 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 14:13:55,779 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:4	Num Templates:8	Num Examples with Template:68
2024-05-01 14:13:57,379 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 14:13:57,379 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 14:13:57,379 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_5/evaluation_runs.json
2024-05-01 14:13:57,379 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:13:57,387 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 14:13:57,602 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:5	Num Templates:8	Num Examples with Template:68
2024-05-01 14:13:59,210 - root - [INFO] - 	!!!Scores: {'accuracy': 0.853, 'average': 0.853}
2024-05-01 14:13:59,210 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 14:13:59,210 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_6/evaluation_runs.json
2024-05-01 14:13:59,210 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:13:59,217 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 14:13:59,426 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:6	Num Templates:8	Num Examples with Template:68
2024-05-01 14:14:00,995 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 14:14:00,996 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 14:14:00,996 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_7/evaluation_runs.json
2024-05-01 14:14:00,996 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:14:01,004 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 14:14:01,212 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:7	Num Templates:8	Num Examples with Template:68
2024-05-01 14:14:02,767 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 14:14:02,767 - root - [INFO] - 	Evaluating model on h-swag dataset
2024-05-01 14:14:02,767 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/h-swag_template_0/evaluation_runs.json
2024-05-01 14:14:02,767 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:14:02,780 - root - [INFO] - 		Loading Full Data for h-swag
2024-05-01 14:14:03,754 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-eac19eb96b26d7b6/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 14:14:04,440 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Selected Examples: 10010	Num Total Example:10010
2024-05-01 14:14:26,592 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Num Selected Example with Templates:10010	Template Idx:0	Num Templates:1	Num Examples with Template:10010
2024-05-01 14:19:40,559 - root - [INFO] - 	!!!Scores: {'accuracy': 0.425, 'average': 0.425}
2024-05-01 14:19:40,559 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 14:19:40,559 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_0/evaluation_runs.json
2024-05-01 14:19:40,559 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:19:41,535 - datasets.builder - [WARNING] - Found cached dataset arrow (/home/guodong/.cache/huggingface/datasets/arrow/default-2796e0ea53bfdf30/0.0.0/74f69db2c14c2860059d39860b1f400a03d11bf7fb5a8258ca38c501c878c137)
2024-05-01 14:19:41,645 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 14:19:47,625 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:0	Num Templates:5	Num Examples with Template:1839
2024-05-01 14:20:11,961 - root - [INFO] - 	!!!Scores: {'accuracy': 0.922, 'average': 0.922}
2024-05-01 14:20:11,961 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 14:20:11,961 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_1/evaluation_runs.json
2024-05-01 14:20:11,962 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:20:11,971 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 14:20:18,069 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:1	Num Templates:5	Num Examples with Template:1839
2024-05-01 14:20:42,992 - root - [INFO] - 	!!!Scores: {'accuracy': 0.92, 'average': 0.92}
2024-05-01 14:20:42,992 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 14:20:42,992 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_2/evaluation_runs.json
2024-05-01 14:20:42,992 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:20:43,000 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 14:20:49,009 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:2	Num Templates:5	Num Examples with Template:1839
2024-05-01 14:21:13,747 - root - [INFO] - 	!!!Scores: {'accuracy': 0.924, 'average': 0.924}
2024-05-01 14:21:13,747 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 14:21:13,747 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_3/evaluation_runs.json
2024-05-01 14:21:13,747 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:21:13,756 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 14:21:19,809 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:3	Num Templates:5	Num Examples with Template:1839
2024-05-01 14:21:44,587 - root - [INFO] - 	!!!Scores: {'accuracy': 0.914, 'average': 0.914}
2024-05-01 14:21:44,587 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 14:21:44,587 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_4/evaluation_runs.json
2024-05-01 14:21:44,587 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:21:44,595 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 14:21:50,646 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:4	Num Templates:5	Num Examples with Template:1839
2024-05-01 14:22:15,946 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 14:22:15,946 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:22:15,946 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_0/evaluation_runs.json
2024-05-01 14:22:15,946 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:22:17,409 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-9d580cad42f0d988/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 14:22:17,462 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:22:19,262 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:22:37,816 - root - [INFO] - 	!!!Scores: {'accuracy': 0.634, 'average': 0.634}
2024-05-01 14:22:37,816 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:22:37,816 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_1/evaluation_runs.json
2024-05-01 14:22:37,816 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:22:37,825 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:22:39,667 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:22:56,310 - root - [INFO] - 	!!!Scores: {'accuracy': 0.665, 'average': 0.665}
2024-05-01 14:22:56,310 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:22:56,311 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_2/evaluation_runs.json
2024-05-01 14:22:56,311 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:22:56,319 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:22:58,143 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:23:14,524 - root - [INFO] - 	!!!Scores: {'accuracy': 0.679, 'average': 0.679}
2024-05-01 14:23:14,524 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:23:14,524 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_3/evaluation_runs.json
2024-05-01 14:23:14,524 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:23:14,533 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:23:16,363 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:23:32,953 - root - [INFO] - 	!!!Scores: {'accuracy': 0.667, 'average': 0.667}
2024-05-01 14:23:32,953 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:23:32,953 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_4/evaluation_runs.json
2024-05-01 14:23:32,954 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:23:32,963 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:23:34,803 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:23:51,684 - root - [INFO] - 	!!!Scores: {'accuracy': 0.672, 'average': 0.672}
2024-05-01 14:23:51,684 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:23:51,684 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_5/evaluation_runs.json
2024-05-01 14:23:51,684 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:23:51,694 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:23:53,524 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:24:10,017 - root - [INFO] - 	!!!Scores: {'accuracy': 0.666, 'average': 0.666}
2024-05-01 14:24:10,017 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:24:10,017 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_6/evaluation_runs.json
2024-05-01 14:24:10,017 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:24:10,026 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:24:12,414 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:24:30,566 - root - [INFO] - 	!!!Scores: {'accuracy': 0.667, 'average': 0.667}
2024-05-01 14:24:30,567 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:24:30,567 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_7/evaluation_runs.json
2024-05-01 14:24:30,567 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:24:30,576 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:24:32,453 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:24:49,190 - root - [INFO] - 	!!!Scores: {'accuracy': 0.671, 'average': 0.671}
2024-05-01 14:24:49,191 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:24:49,191 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_8/evaluation_runs.json
2024-05-01 14:24:49,191 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:24:49,200 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:24:51,055 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:25:07,629 - root - [INFO] - 	!!!Scores: {'accuracy': 0.672, 'average': 0.672}
2024-05-01 14:25:07,629 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:25:07,629 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_9/evaluation_runs.json
2024-05-01 14:25:07,630 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:25:07,639 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:25:10,040 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:25:27,352 - root - [INFO] - 	!!!Scores: {'accuracy': 0.64, 'average': 0.64}
2024-05-01 14:25:27,353 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:25:27,353 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_10/evaluation_runs.json
2024-05-01 14:25:27,353 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:25:27,362 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:25:29,779 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:25:46,776 - root - [INFO] - 	!!!Scores: {'accuracy': 0.656, 'average': 0.656}
2024-05-01 14:25:46,776 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:25:46,776 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_11/evaluation_runs.json
2024-05-01 14:25:46,776 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:25:46,786 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:25:48,620 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:26:05,329 - root - [INFO] - 	!!!Scores: {'accuracy': 0.661, 'average': 0.661}
2024-05-01 14:26:05,329 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:26:05,329 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_12/evaluation_runs.json
2024-05-01 14:26:05,329 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:26:05,338 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:26:07,949 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:26:24,982 - root - [INFO] - 	!!!Scores: {'accuracy': 0.637, 'average': 0.637}
2024-05-01 14:26:24,982 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:26:24,982 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_13/evaluation_runs.json
2024-05-01 14:26:24,982 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:26:24,992 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:26:27,355 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:26:45,244 - root - [INFO] - 	!!!Scores: {'accuracy': 0.66, 'average': 0.66}
2024-05-01 14:26:45,244 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:26:45,244 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_14/evaluation_runs.json
2024-05-01 14:26:45,244 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:26:45,254 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:26:47,104 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:27:03,750 - root - [INFO] - 	!!!Scores: {'accuracy': 0.658, 'average': 0.658}
2024-05-01 14:27:03,750 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:27:03,750 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_0/evaluation_runs.json
2024-05-01 14:27:03,750 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:27:04,509 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-2f5e309763d7971f/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 14:27:04,563 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:27:06,365 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:27:24,308 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 14:27:24,308 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:27:24,308 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_1/evaluation_runs.json
2024-05-01 14:27:24,308 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:27:24,317 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:27:26,173 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:27:42,446 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 14:27:42,446 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:27:42,446 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_2/evaluation_runs.json
2024-05-01 14:27:42,446 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:27:42,456 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:27:44,273 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:28:00,146 - root - [INFO] - 	!!!Scores: {'accuracy': 0.523, 'average': 0.523}
2024-05-01 14:28:00,146 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:28:00,146 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_3/evaluation_runs.json
2024-05-01 14:28:00,146 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:28:00,155 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:28:01,968 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:28:18,116 - root - [INFO] - 	!!!Scores: {'accuracy': 0.499, 'average': 0.499}
2024-05-01 14:28:18,117 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:28:18,117 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_4/evaluation_runs.json
2024-05-01 14:28:18,117 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:28:18,126 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:28:19,925 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:28:36,515 - root - [INFO] - 	!!!Scores: {'accuracy': 0.504, 'average': 0.504}
2024-05-01 14:28:36,515 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:28:36,515 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_5/evaluation_runs.json
2024-05-01 14:28:36,515 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:28:36,524 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:28:38,353 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:28:54,575 - root - [INFO] - 	!!!Scores: {'accuracy': 0.51, 'average': 0.51}
2024-05-01 14:28:54,575 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:28:54,575 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_6/evaluation_runs.json
2024-05-01 14:28:54,575 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:28:54,585 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:28:56,962 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:29:14,724 - root - [INFO] - 	!!!Scores: {'accuracy': 0.51, 'average': 0.51}
2024-05-01 14:29:14,724 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:29:14,724 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_7/evaluation_runs.json
2024-05-01 14:29:14,725 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:29:14,733 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:29:16,569 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:29:33,079 - root - [INFO] - 	!!!Scores: {'accuracy': 0.515, 'average': 0.515}
2024-05-01 14:29:33,079 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:29:33,079 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_8/evaluation_runs.json
2024-05-01 14:29:33,079 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:29:33,089 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:29:34,922 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:29:51,288 - root - [INFO] - 	!!!Scores: {'accuracy': 0.516, 'average': 0.516}
2024-05-01 14:29:51,288 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:29:51,288 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_9/evaluation_runs.json
2024-05-01 14:29:51,288 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:29:51,296 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:29:53,684 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:30:10,580 - root - [INFO] - 	!!!Scores: {'accuracy': 0.504, 'average': 0.504}
2024-05-01 14:30:10,580 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:30:10,580 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_10/evaluation_runs.json
2024-05-01 14:30:10,580 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:30:10,589 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:30:12,946 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:30:29,522 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 14:30:29,522 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:30:29,522 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_11/evaluation_runs.json
2024-05-01 14:30:29,522 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:30:29,530 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:30:31,322 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:30:47,561 - root - [INFO] - 	!!!Scores: {'accuracy': 0.515, 'average': 0.515}
2024-05-01 14:30:47,561 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:30:47,561 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_12/evaluation_runs.json
2024-05-01 14:30:47,562 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:30:47,569 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:30:49,913 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:31:06,530 - root - [INFO] - 	!!!Scores: {'accuracy': 0.494, 'average': 0.494}
2024-05-01 14:31:06,530 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:31:06,530 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_13/evaluation_runs.json
2024-05-01 14:31:06,530 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:31:06,539 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:31:08,877 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:31:26,368 - root - [INFO] - 	!!!Scores: {'accuracy': 0.491, 'average': 0.491}
2024-05-01 14:31:26,369 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:31:26,369 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_14/evaluation_runs.json
2024-05-01 14:31:26,369 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:31:26,377 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:31:28,208 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:31:44,529 - root - [INFO] - 	!!!Scores: {'accuracy': 0.513, 'average': 0.513}
2024-05-01 14:31:44,529 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:31:44,529 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_0/evaluation_runs.json
2024-05-01 14:31:44,529 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:31:45,518 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-5854c73e8e2a58bf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 14:31:45,578 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:31:47,733 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:0	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:32:13,139 - root - [INFO] - 	!!!Scores: {'accuracy': 0.503, 'average': 0.503}
2024-05-01 14:32:13,140 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:32:13,140 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_1/evaluation_runs.json
2024-05-01 14:32:13,140 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:32:13,149 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:32:15,363 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:1	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:32:38,399 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 14:32:38,399 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:32:38,399 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_2/evaluation_runs.json
2024-05-01 14:32:38,399 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:32:38,408 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:32:40,596 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:2	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:33:03,135 - root - [INFO] - 	!!!Scores: {'accuracy': 0.488, 'average': 0.488}
2024-05-01 14:33:03,135 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:33:03,136 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_3/evaluation_runs.json
2024-05-01 14:33:03,136 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:33:03,144 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:33:05,298 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:3	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:33:27,994 - root - [INFO] - 	!!!Scores: {'accuracy': 0.51, 'average': 0.51}
2024-05-01 14:33:27,994 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:33:27,994 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_4/evaluation_runs.json
2024-05-01 14:33:27,994 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:33:28,003 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:33:30,159 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:4	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:33:53,406 - root - [INFO] - 	!!!Scores: {'accuracy': 0.495, 'average': 0.495}
2024-05-01 14:33:53,407 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:33:53,407 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_5/evaluation_runs.json
2024-05-01 14:33:53,407 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:33:53,415 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:33:55,593 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:5	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:34:18,354 - root - [INFO] - 	!!!Scores: {'accuracy': 0.507, 'average': 0.507}
2024-05-01 14:34:18,354 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:34:18,354 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_6/evaluation_runs.json
2024-05-01 14:34:18,355 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:34:18,363 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:34:21,206 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:6	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:34:46,083 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 14:34:46,083 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:34:46,083 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_7/evaluation_runs.json
2024-05-01 14:34:46,083 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:34:46,091 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:34:48,298 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:7	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:35:11,460 - root - [INFO] - 	!!!Scores: {'accuracy': 0.484, 'average': 0.484}
2024-05-01 14:35:11,460 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:35:11,460 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_8/evaluation_runs.json
2024-05-01 14:35:11,460 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:35:11,469 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:35:13,676 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:8	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:35:36,528 - root - [INFO] - 	!!!Scores: {'accuracy': 0.483, 'average': 0.483}
2024-05-01 14:35:36,528 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:35:36,528 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_9/evaluation_runs.json
2024-05-01 14:35:36,528 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:35:36,536 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:35:39,381 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:9	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:36:03,110 - root - [INFO] - 	!!!Scores: {'accuracy': 0.49, 'average': 0.49}
2024-05-01 14:36:03,110 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:36:03,110 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_10/evaluation_runs.json
2024-05-01 14:36:03,110 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:36:03,118 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:36:05,964 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:10	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:36:29,327 - root - [INFO] - 	!!!Scores: {'accuracy': 0.504, 'average': 0.504}
2024-05-01 14:36:29,327 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:36:29,327 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_11/evaluation_runs.json
2024-05-01 14:36:29,327 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:36:29,335 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:36:31,492 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:11	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:36:54,361 - root - [INFO] - 	!!!Scores: {'accuracy': 0.495, 'average': 0.495}
2024-05-01 14:36:54,361 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:36:54,361 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_12/evaluation_runs.json
2024-05-01 14:36:54,361 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:36:54,369 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:36:57,190 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:12	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:37:20,560 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 14:37:20,560 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:37:20,560 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_13/evaluation_runs.json
2024-05-01 14:37:20,560 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:37:20,569 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:37:23,383 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:13	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:37:47,780 - root - [INFO] - 	!!!Scores: {'accuracy': 0.507, 'average': 0.507}
2024-05-01 14:37:47,780 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 14:37:47,780 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_14/evaluation_runs.json
2024-05-01 14:37:47,780 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:37:47,788 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 14:37:50,004 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:14	Num Templates:15	Num Examples with Template:1200
2024-05-01 14:38:12,877 - root - [INFO] - 	!!!Scores: {'accuracy': 0.499, 'average': 0.499}
2024-05-01 14:38:12,952 - root - [INFO] - Unexpected keys: []
2024-05-01 14:38:13,184 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 14:38:13,184 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_0/evaluation_runs.json
2024-05-01 14:38:13,184 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:38:13,192 - root - [INFO] - 		Loading Full Data for rte
2024-05-01 14:38:13,908 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-667c257dd17a5fbb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 14:38:13,931 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 14:38:14,448 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:0	Num Templates:10	Num Examples with Template:245
2024-05-01 14:38:20,219 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 14:38:20,219 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 14:38:20,219 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_1/evaluation_runs.json
2024-05-01 14:38:20,220 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:38:20,228 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 14:38:20,746 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:1	Num Templates:10	Num Examples with Template:245
2024-05-01 14:38:26,215 - root - [INFO] - 	!!!Scores: {'accuracy': 0.784, 'average': 0.784}
2024-05-01 14:38:26,215 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 14:38:26,215 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_2/evaluation_runs.json
2024-05-01 14:38:26,215 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:38:26,223 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 14:38:26,743 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:2	Num Templates:10	Num Examples with Template:245
2024-05-01 14:38:32,213 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 14:38:32,213 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 14:38:32,213 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_3/evaluation_runs.json
2024-05-01 14:38:32,213 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:38:32,221 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 14:38:32,735 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:3	Num Templates:10	Num Examples with Template:245
2024-05-01 14:38:38,133 - root - [INFO] - 	!!!Scores: {'accuracy': 0.784, 'average': 0.784}
2024-05-01 14:38:38,133 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 14:38:38,133 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_4/evaluation_runs.json
2024-05-01 14:38:38,133 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:38:38,141 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 14:38:38,656 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:4	Num Templates:10	Num Examples with Template:245
2024-05-01 14:38:44,126 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 14:38:44,126 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 14:38:44,126 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_5/evaluation_runs.json
2024-05-01 14:38:44,126 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:38:44,134 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 14:38:44,653 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:5	Num Templates:10	Num Examples with Template:245
2024-05-01 14:38:50,124 - root - [INFO] - 	!!!Scores: {'accuracy': 0.78, 'average': 0.78}
2024-05-01 14:38:50,125 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 14:38:50,125 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_6/evaluation_runs.json
2024-05-01 14:38:50,125 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:38:50,132 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 14:38:50,652 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:6	Num Templates:10	Num Examples with Template:245
2024-05-01 14:38:56,061 - root - [INFO] - 	!!!Scores: {'accuracy': 0.78, 'average': 0.78}
2024-05-01 14:38:56,061 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 14:38:56,061 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_7/evaluation_runs.json
2024-05-01 14:38:56,061 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:38:56,071 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 14:38:56,589 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:7	Num Templates:10	Num Examples with Template:245
2024-05-01 14:39:02,204 - root - [INFO] - 	!!!Scores: {'accuracy': 0.78, 'average': 0.78}
2024-05-01 14:39:02,204 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 14:39:02,204 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_8/evaluation_runs.json
2024-05-01 14:39:02,204 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:39:02,212 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 14:39:02,729 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:8	Num Templates:10	Num Examples with Template:245
2024-05-01 14:39:08,203 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 14:39:08,203 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 14:39:08,204 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_9/evaluation_runs.json
2024-05-01 14:39:08,204 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:39:08,212 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 14:39:08,742 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:9	Num Templates:10	Num Examples with Template:245
2024-05-01 14:39:14,311 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 14:39:14,311 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:39:14,311 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_0/evaluation_runs.json
2024-05-01 14:39:14,311 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:39:14,319 - root - [INFO] - 		Loading Full Data for cb
2024-05-01 14:39:15,044 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-c26e73467e3f215e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 14:39:15,053 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:39:15,117 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:0	Num Templates:15	Num Examples with Template:24
2024-05-01 14:39:16,566 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 14:39:16,566 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:39:16,566 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_1/evaluation_runs.json
2024-05-01 14:39:16,566 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:39:16,574 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:39:16,626 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:1	Num Templates:15	Num Examples with Template:24
2024-05-01 14:39:18,077 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 14:39:18,077 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:39:18,078 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_2/evaluation_runs.json
2024-05-01 14:39:18,078 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:39:18,086 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:39:18,150 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:2	Num Templates:15	Num Examples with Template:24
2024-05-01 14:39:19,621 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 14:39:19,621 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:39:19,621 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_3/evaluation_runs.json
2024-05-01 14:39:19,621 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:39:19,629 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:39:19,681 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:3	Num Templates:15	Num Examples with Template:24
2024-05-01 14:39:21,118 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 14:39:21,118 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:39:21,118 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_4/evaluation_runs.json
2024-05-01 14:39:21,119 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:39:21,126 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:39:21,178 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:4	Num Templates:15	Num Examples with Template:24
2024-05-01 14:39:22,621 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 14:39:22,622 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:39:22,622 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_5/evaluation_runs.json
2024-05-01 14:39:22,622 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:39:22,630 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:39:22,695 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:5	Num Templates:15	Num Examples with Template:24
2024-05-01 14:39:24,148 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 14:39:24,148 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:39:24,149 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_6/evaluation_runs.json
2024-05-01 14:39:24,149 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:39:24,157 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:39:24,208 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:6	Num Templates:15	Num Examples with Template:24
2024-05-01 14:39:25,650 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 14:39:25,650 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:39:25,650 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_7/evaluation_runs.json
2024-05-01 14:39:25,650 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:39:25,658 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:39:25,723 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:7	Num Templates:15	Num Examples with Template:24
2024-05-01 14:39:27,177 - root - [INFO] - 	!!!Scores: {'accuracy': 0.75, 'average': 0.75}
2024-05-01 14:39:27,177 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:39:27,177 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_8/evaluation_runs.json
2024-05-01 14:39:27,177 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:39:27,185 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:39:27,237 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:8	Num Templates:15	Num Examples with Template:24
2024-05-01 14:39:28,685 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 14:39:28,685 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:39:28,685 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_9/evaluation_runs.json
2024-05-01 14:39:28,685 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:39:28,693 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:39:28,745 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:9	Num Templates:15	Num Examples with Template:24
2024-05-01 14:39:30,194 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 14:39:30,194 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:39:30,194 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_10/evaluation_runs.json
2024-05-01 14:39:30,195 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:39:30,202 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:39:30,267 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:10	Num Templates:15	Num Examples with Template:24
2024-05-01 14:39:31,725 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 14:39:31,726 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:39:31,726 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_11/evaluation_runs.json
2024-05-01 14:39:31,726 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:39:31,733 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:39:31,784 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:11	Num Templates:15	Num Examples with Template:24
2024-05-01 14:39:33,228 - root - [INFO] - 	!!!Scores: {'accuracy': 0.75, 'average': 0.75}
2024-05-01 14:39:33,228 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:39:33,228 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_12/evaluation_runs.json
2024-05-01 14:39:33,228 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:39:33,236 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:39:33,287 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:12	Num Templates:15	Num Examples with Template:24
2024-05-01 14:39:34,775 - root - [INFO] - 	!!!Scores: {'accuracy': 0.75, 'average': 0.75}
2024-05-01 14:39:34,775 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:39:34,775 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_13/evaluation_runs.json
2024-05-01 14:39:34,775 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:39:34,783 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:39:34,835 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:13	Num Templates:15	Num Examples with Template:24
2024-05-01 14:39:36,280 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 14:39:36,280 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 14:39:36,280 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_14/evaluation_runs.json
2024-05-01 14:39:36,281 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:39:36,288 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 14:39:36,353 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:14	Num Templates:15	Num Examples with Template:24
2024-05-01 14:39:37,834 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 14:39:37,834 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 14:39:37,834 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_0/evaluation_runs.json
2024-05-01 14:39:37,834 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:39:38,787 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-3ae7a9f52719d86a/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 14:39:38,842 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 14:39:42,453 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:0	Num Templates:5	Num Examples with Template:1235
2024-05-01 14:39:52,168 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 14:39:52,168 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 14:39:52,168 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_1/evaluation_runs.json
2024-05-01 14:39:52,169 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:39:52,177 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 14:39:55,721 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:1	Num Templates:5	Num Examples with Template:1235
2024-05-01 14:40:05,540 - root - [INFO] - 	!!!Scores: {'accuracy': 0.666, 'average': 0.666}
2024-05-01 14:40:05,540 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 14:40:05,540 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_2/evaluation_runs.json
2024-05-01 14:40:05,540 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:40:05,548 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 14:40:09,299 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:2	Num Templates:5	Num Examples with Template:1235
2024-05-01 14:40:19,094 - root - [INFO] - 	!!!Scores: {'accuracy': 0.671, 'average': 0.671}
2024-05-01 14:40:19,095 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 14:40:19,095 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_3/evaluation_runs.json
2024-05-01 14:40:19,095 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:40:19,103 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 14:40:22,744 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:3	Num Templates:5	Num Examples with Template:1235
2024-05-01 14:40:32,847 - root - [INFO] - 	!!!Scores: {'accuracy': 0.661, 'average': 0.661}
2024-05-01 14:40:32,847 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 14:40:32,847 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_4/evaluation_runs.json
2024-05-01 14:40:32,847 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:40:32,856 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 14:40:36,465 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:4	Num Templates:5	Num Examples with Template:1235
2024-05-01 14:40:46,451 - root - [INFO] - 	!!!Scores: {'accuracy': 0.669, 'average': 0.669}
2024-05-01 14:40:46,451 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 14:40:46,451 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_0/evaluation_runs.json
2024-05-01 14:40:46,451 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:40:46,459 - root - [INFO] - 		Loading Full Data for wic
2024-05-01 14:40:47,187 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-8f888d9273347dcb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 14:40:47,238 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 14:40:48,734 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:0	Num Templates:10	Num Examples with Template:606
2024-05-01 14:40:54,061 - root - [INFO] - 	!!!Scores: {'accuracy': 0.649, 'average': 0.649}
2024-05-01 14:40:54,062 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 14:40:54,062 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_1/evaluation_runs.json
2024-05-01 14:40:54,062 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:40:54,070 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 14:40:55,565 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:1	Num Templates:10	Num Examples with Template:606
2024-05-01 14:41:00,752 - root - [INFO] - 	!!!Scores: {'accuracy': 0.65, 'average': 0.65}
2024-05-01 14:41:00,752 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 14:41:00,752 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_2/evaluation_runs.json
2024-05-01 14:41:00,752 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:41:00,760 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 14:41:02,259 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:2	Num Templates:10	Num Examples with Template:606
2024-05-01 14:41:07,961 - root - [INFO] - 	!!!Scores: {'accuracy': 0.642, 'average': 0.642}
2024-05-01 14:41:07,961 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 14:41:07,961 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_3/evaluation_runs.json
2024-05-01 14:41:07,961 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:41:07,969 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 14:41:09,465 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:3	Num Templates:10	Num Examples with Template:606
2024-05-01 14:41:15,293 - root - [INFO] - 	!!!Scores: {'accuracy': 0.525, 'average': 0.525}
2024-05-01 14:41:15,293 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 14:41:15,293 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_4/evaluation_runs.json
2024-05-01 14:41:15,293 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:41:15,301 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 14:41:16,784 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:4	Num Templates:10	Num Examples with Template:606
2024-05-01 14:41:22,192 - root - [INFO] - 	!!!Scores: {'accuracy': 0.523, 'average': 0.523}
2024-05-01 14:41:22,192 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 14:41:22,192 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_5/evaluation_runs.json
2024-05-01 14:41:22,192 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:41:22,200 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 14:41:23,711 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:5	Num Templates:10	Num Examples with Template:606
2024-05-01 14:41:29,550 - root - [INFO] - 	!!!Scores: {'accuracy': 0.62, 'average': 0.62}
2024-05-01 14:41:29,550 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 14:41:29,550 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_6/evaluation_runs.json
2024-05-01 14:41:29,550 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:41:29,558 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 14:41:31,050 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:6	Num Templates:10	Num Examples with Template:606
2024-05-01 14:41:36,461 - root - [INFO] - 	!!!Scores: {'accuracy': 0.655, 'average': 0.655}
2024-05-01 14:41:36,461 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 14:41:36,461 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_7/evaluation_runs.json
2024-05-01 14:41:36,461 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:41:36,469 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 14:41:37,949 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:7	Num Templates:10	Num Examples with Template:606
2024-05-01 14:41:43,518 - root - [INFO] - 	!!!Scores: {'accuracy': 0.508, 'average': 0.508}
2024-05-01 14:41:43,519 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 14:41:43,519 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_8/evaluation_runs.json
2024-05-01 14:41:43,519 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:41:43,526 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 14:41:45,021 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:8	Num Templates:10	Num Examples with Template:606
2024-05-01 14:41:51,103 - root - [INFO] - 	!!!Scores: {'accuracy': 0.627, 'average': 0.627}
2024-05-01 14:41:51,103 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 14:41:51,103 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_9/evaluation_runs.json
2024-05-01 14:41:51,103 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:41:51,111 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 14:41:52,586 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:9	Num Templates:10	Num Examples with Template:606
2024-05-01 14:41:57,320 - root - [INFO] - 	!!!Scores: {'accuracy': 0.645, 'average': 0.645}
2024-05-01 14:41:57,320 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 14:41:57,320 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_0/evaluation_runs.json
2024-05-01 14:41:57,321 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:41:57,328 - root - [INFO] - 		Loading Full Data for wsc
2024-05-01 14:41:58,063 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-4bcfdb1f33f6e278/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 14:41:58,080 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 14:41:58,269 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:0	Num Templates:10	Num Examples with Template:72
2024-05-01 14:42:00,059 - root - [INFO] - 	!!!Scores: {'accuracy': 0.597, 'average': 0.597}
2024-05-01 14:42:00,059 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 14:42:00,059 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_1/evaluation_runs.json
2024-05-01 14:42:00,060 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:42:00,067 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 14:42:00,239 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:1	Num Templates:10	Num Examples with Template:72
2024-05-01 14:42:01,999 - root - [INFO] - 	!!!Scores: {'accuracy': 0.556, 'average': 0.556}
2024-05-01 14:42:01,999 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 14:42:01,999 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_2/evaluation_runs.json
2024-05-01 14:42:01,999 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:42:02,006 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 14:42:02,223 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:2	Num Templates:10	Num Examples with Template:72
2024-05-01 14:42:04,147 - root - [INFO] - 	!!!Scores: {'accuracy': 0.694, 'average': 0.694}
2024-05-01 14:42:04,147 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 14:42:04,147 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_3/evaluation_runs.json
2024-05-01 14:42:04,147 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:42:04,155 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 14:42:04,371 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:3	Num Templates:10	Num Examples with Template:72
2024-05-01 14:42:06,299 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 14:42:06,299 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 14:42:06,299 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_4/evaluation_runs.json
2024-05-01 14:42:06,299 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:42:06,307 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 14:42:06,489 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:4	Num Templates:10	Num Examples with Template:72
2024-05-01 14:42:08,240 - root - [INFO] - 	!!!Scores: {'accuracy': 0.556, 'average': 0.556}
2024-05-01 14:42:08,241 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 14:42:08,241 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_5/evaluation_runs.json
2024-05-01 14:42:08,241 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:42:08,248 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 14:42:08,423 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:5	Num Templates:10	Num Examples with Template:72
2024-05-01 14:42:10,222 - root - [INFO] - 	!!!Scores: {'accuracy': 0.597, 'average': 0.597}
2024-05-01 14:42:10,222 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 14:42:10,222 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_6/evaluation_runs.json
2024-05-01 14:42:10,222 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:42:10,230 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 14:42:10,402 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:6	Num Templates:10	Num Examples with Template:72
2024-05-01 14:42:12,212 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 14:42:12,212 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 14:42:12,212 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_7/evaluation_runs.json
2024-05-01 14:42:12,212 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:42:12,220 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 14:42:12,494 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:7	Num Templates:10	Num Examples with Template:72
2024-05-01 14:42:14,261 - root - [INFO] - 	!!!Scores: {'accuracy': 0.514, 'average': 0.514}
2024-05-01 14:42:14,261 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 14:42:14,261 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_8/evaluation_runs.json
2024-05-01 14:42:14,261 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:42:14,269 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 14:42:14,442 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:8	Num Templates:10	Num Examples with Template:72
2024-05-01 14:42:16,236 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 14:42:16,237 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 14:42:16,237 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_9/evaluation_runs.json
2024-05-01 14:42:16,237 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:42:16,244 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 14:42:16,534 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:9	Num Templates:10	Num Examples with Template:72
2024-05-01 14:42:18,295 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 14:42:18,295 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 14:42:18,295 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_0/evaluation_runs.json
2024-05-01 14:42:18,295 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:42:18,303 - root - [INFO] - 		Loading Full Data for copa
2024-05-01 14:42:19,242 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-75ee279101665e7b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 14:42:19,258 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 14:42:19,475 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:0	Num Templates:8	Num Examples with Template:68
2024-05-01 14:42:21,064 - root - [INFO] - 	!!!Scores: {'accuracy': 0.868, 'average': 0.868}
2024-05-01 14:42:21,064 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 14:42:21,064 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_1/evaluation_runs.json
2024-05-01 14:42:21,064 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:42:21,072 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 14:42:21,280 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:1	Num Templates:8	Num Examples with Template:68
2024-05-01 14:42:22,896 - root - [INFO] - 	!!!Scores: {'accuracy': 0.809, 'average': 0.809}
2024-05-01 14:42:22,897 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 14:42:22,897 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_2/evaluation_runs.json
2024-05-01 14:42:22,897 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:42:22,904 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 14:42:23,113 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:2	Num Templates:8	Num Examples with Template:68
2024-05-01 14:42:24,708 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 14:42:24,709 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 14:42:24,709 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_3/evaluation_runs.json
2024-05-01 14:42:24,709 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:42:24,716 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 14:42:24,939 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:3	Num Templates:8	Num Examples with Template:68
2024-05-01 14:42:26,473 - root - [INFO] - 	!!!Scores: {'accuracy': 0.824, 'average': 0.824}
2024-05-01 14:42:26,473 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 14:42:26,473 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_4/evaluation_runs.json
2024-05-01 14:42:26,473 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:42:26,481 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 14:42:26,688 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:4	Num Templates:8	Num Examples with Template:68
2024-05-01 14:42:28,287 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 14:42:28,287 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 14:42:28,287 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_5/evaluation_runs.json
2024-05-01 14:42:28,287 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:42:28,295 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 14:42:28,505 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:5	Num Templates:8	Num Examples with Template:68
2024-05-01 14:42:30,109 - root - [INFO] - 	!!!Scores: {'accuracy': 0.853, 'average': 0.853}
2024-05-01 14:42:30,110 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 14:42:30,110 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_6/evaluation_runs.json
2024-05-01 14:42:30,110 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:42:30,117 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 14:42:30,325 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:6	Num Templates:8	Num Examples with Template:68
2024-05-01 14:42:31,887 - root - [INFO] - 	!!!Scores: {'accuracy': 0.824, 'average': 0.824}
2024-05-01 14:42:31,887 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 14:42:31,887 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_7/evaluation_runs.json
2024-05-01 14:42:31,887 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:42:31,895 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 14:42:32,102 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:7	Num Templates:8	Num Examples with Template:68
2024-05-01 14:42:33,656 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 14:42:33,656 - root - [INFO] - 	Evaluating model on h-swag dataset
2024-05-01 14:42:33,656 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/h-swag_template_0/evaluation_runs.json
2024-05-01 14:42:33,656 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:42:33,669 - root - [INFO] - 		Loading Full Data for h-swag
2024-05-01 14:42:34,407 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-eac19eb96b26d7b6/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 14:42:35,099 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Selected Examples: 10010	Num Total Example:10010
2024-05-01 14:42:56,944 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Num Selected Example with Templates:10010	Template Idx:0	Num Templates:1	Num Examples with Template:10010
2024-05-01 14:48:11,052 - root - [INFO] - 	!!!Scores: {'accuracy': 0.423, 'average': 0.423}
2024-05-01 14:48:11,052 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 14:48:11,052 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_0/evaluation_runs.json
2024-05-01 14:48:11,052 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:48:11,804 - datasets.builder - [WARNING] - Found cached dataset arrow (/home/guodong/.cache/huggingface/datasets/arrow/default-2796e0ea53bfdf30/0.0.0/74f69db2c14c2860059d39860b1f400a03d11bf7fb5a8258ca38c501c878c137)
2024-05-01 14:48:11,917 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 14:48:17,907 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:0	Num Templates:5	Num Examples with Template:1839
2024-05-01 14:48:42,409 - root - [INFO] - 	!!!Scores: {'accuracy': 0.911, 'average': 0.911}
2024-05-01 14:48:42,409 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 14:48:42,409 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_1/evaluation_runs.json
2024-05-01 14:48:42,410 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:48:42,418 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 14:48:48,489 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:1	Num Templates:5	Num Examples with Template:1839
2024-05-01 14:49:13,365 - root - [INFO] - 	!!!Scores: {'accuracy': 0.913, 'average': 0.913}
2024-05-01 14:49:13,366 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 14:49:13,366 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_2/evaluation_runs.json
2024-05-01 14:49:13,366 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:49:13,374 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 14:49:19,362 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:2	Num Templates:5	Num Examples with Template:1839
2024-05-01 14:49:43,931 - root - [INFO] - 	!!!Scores: {'accuracy': 0.921, 'average': 0.921}
2024-05-01 14:49:43,931 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 14:49:43,931 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_3/evaluation_runs.json
2024-05-01 14:49:43,932 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:49:43,940 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 14:49:49,967 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:3	Num Templates:5	Num Examples with Template:1839
2024-05-01 14:50:14,566 - root - [INFO] - 	!!!Scores: {'accuracy': 0.91, 'average': 0.91}
2024-05-01 14:50:14,566 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 14:50:14,566 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_4/evaluation_runs.json
2024-05-01 14:50:14,566 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:50:14,574 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 14:50:20,577 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:4	Num Templates:5	Num Examples with Template:1839
2024-05-01 14:50:45,848 - root - [INFO] - 	!!!Scores: {'accuracy': 0.909, 'average': 0.909}
2024-05-01 14:50:45,848 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:50:45,848 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_0/evaluation_runs.json
2024-05-01 14:50:45,848 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:50:46,575 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-9d580cad42f0d988/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 14:50:46,628 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:50:48,429 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:51:06,893 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 14:51:06,893 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:51:06,893 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_1/evaluation_runs.json
2024-05-01 14:51:06,893 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:51:06,901 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:51:08,737 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:51:25,302 - root - [INFO] - 	!!!Scores: {'accuracy': 0.671, 'average': 0.671}
2024-05-01 14:51:25,303 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:51:25,303 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_2/evaluation_runs.json
2024-05-01 14:51:25,303 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:51:25,312 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:51:27,126 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:51:43,394 - root - [INFO] - 	!!!Scores: {'accuracy': 0.677, 'average': 0.677}
2024-05-01 14:51:43,394 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:51:43,394 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_3/evaluation_runs.json
2024-05-01 14:51:43,395 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:51:43,402 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:51:45,204 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:52:01,626 - root - [INFO] - 	!!!Scores: {'accuracy': 0.672, 'average': 0.672}
2024-05-01 14:52:01,626 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:52:01,626 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_4/evaluation_runs.json
2024-05-01 14:52:01,626 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:52:01,634 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:52:03,428 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:52:20,217 - root - [INFO] - 	!!!Scores: {'accuracy': 0.677, 'average': 0.677}
2024-05-01 14:52:20,217 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:52:20,217 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_5/evaluation_runs.json
2024-05-01 14:52:20,217 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:52:20,225 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:52:22,040 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:52:38,502 - root - [INFO] - 	!!!Scores: {'accuracy': 0.669, 'average': 0.669}
2024-05-01 14:52:38,503 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:52:38,503 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_6/evaluation_runs.json
2024-05-01 14:52:38,503 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:52:38,511 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:52:40,896 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:52:59,021 - root - [INFO] - 	!!!Scores: {'accuracy': 0.669, 'average': 0.669}
2024-05-01 14:52:59,021 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:52:59,021 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_7/evaluation_runs.json
2024-05-01 14:52:59,021 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:52:59,029 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:53:00,864 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:53:17,566 - root - [INFO] - 	!!!Scores: {'accuracy': 0.678, 'average': 0.678}
2024-05-01 14:53:17,566 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:53:17,566 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_8/evaluation_runs.json
2024-05-01 14:53:17,566 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:53:17,574 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:53:19,406 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:53:35,926 - root - [INFO] - 	!!!Scores: {'accuracy': 0.674, 'average': 0.674}
2024-05-01 14:53:35,926 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:53:35,926 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_9/evaluation_runs.json
2024-05-01 14:53:35,926 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:53:35,934 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:53:38,304 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:53:55,483 - root - [INFO] - 	!!!Scores: {'accuracy': 0.654, 'average': 0.654}
2024-05-01 14:53:55,483 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:53:55,483 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_10/evaluation_runs.json
2024-05-01 14:53:55,483 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:53:55,492 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:53:57,850 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:54:14,692 - root - [INFO] - 	!!!Scores: {'accuracy': 0.664, 'average': 0.664}
2024-05-01 14:54:14,693 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:54:14,693 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_11/evaluation_runs.json
2024-05-01 14:54:14,693 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:54:14,701 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:54:16,493 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:54:33,018 - root - [INFO] - 	!!!Scores: {'accuracy': 0.673, 'average': 0.673}
2024-05-01 14:54:33,018 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:54:33,018 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_12/evaluation_runs.json
2024-05-01 14:54:33,018 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:54:33,026 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:54:35,373 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:54:52,233 - root - [INFO] - 	!!!Scores: {'accuracy': 0.651, 'average': 0.651}
2024-05-01 14:54:52,233 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:54:52,233 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_13/evaluation_runs.json
2024-05-01 14:54:52,233 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:54:52,241 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:54:54,589 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:55:12,315 - root - [INFO] - 	!!!Scores: {'accuracy': 0.668, 'average': 0.668}
2024-05-01 14:55:12,315 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 14:55:12,315 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_14/evaluation_runs.json
2024-05-01 14:55:12,315 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:55:12,323 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:55:14,154 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:55:30,685 - root - [INFO] - 	!!!Scores: {'accuracy': 0.673, 'average': 0.673}
2024-05-01 14:55:30,685 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:55:30,685 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_0/evaluation_runs.json
2024-05-01 14:55:30,686 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:55:31,638 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-2f5e309763d7971f/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 14:55:31,691 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:55:33,484 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:55:51,409 - root - [INFO] - 	!!!Scores: {'accuracy': 0.499, 'average': 0.499}
2024-05-01 14:55:51,409 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:55:51,409 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_1/evaluation_runs.json
2024-05-01 14:55:51,409 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:55:51,417 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:55:53,252 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:56:09,500 - root - [INFO] - 	!!!Scores: {'accuracy': 0.515, 'average': 0.515}
2024-05-01 14:56:09,501 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:56:09,501 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_2/evaluation_runs.json
2024-05-01 14:56:09,501 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:56:09,509 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:56:11,319 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:56:27,184 - root - [INFO] - 	!!!Scores: {'accuracy': 0.521, 'average': 0.521}
2024-05-01 14:56:27,184 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:56:27,184 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_3/evaluation_runs.json
2024-05-01 14:56:27,184 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:56:27,192 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:56:28,987 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:56:45,035 - root - [INFO] - 	!!!Scores: {'accuracy': 0.505, 'average': 0.505}
2024-05-01 14:56:45,035 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:56:45,035 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_4/evaluation_runs.json
2024-05-01 14:56:45,035 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:56:45,043 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:56:46,832 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:57:03,297 - root - [INFO] - 	!!!Scores: {'accuracy': 0.516, 'average': 0.516}
2024-05-01 14:57:03,297 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:57:03,297 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_5/evaluation_runs.json
2024-05-01 14:57:03,298 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:57:03,305 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:57:05,116 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:57:21,247 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 14:57:21,248 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:57:21,248 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_6/evaluation_runs.json
2024-05-01 14:57:21,248 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:57:21,256 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:57:23,623 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:57:41,275 - root - [INFO] - 	!!!Scores: {'accuracy': 0.519, 'average': 0.519}
2024-05-01 14:57:41,276 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:57:41,276 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_7/evaluation_runs.json
2024-05-01 14:57:41,276 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:57:41,284 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:57:43,118 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:57:59,516 - root - [INFO] - 	!!!Scores: {'accuracy': 0.521, 'average': 0.521}
2024-05-01 14:57:59,516 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:57:59,516 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_8/evaluation_runs.json
2024-05-01 14:57:59,516 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:57:59,524 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:58:01,364 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:58:17,600 - root - [INFO] - 	!!!Scores: {'accuracy': 0.517, 'average': 0.517}
2024-05-01 14:58:17,601 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:58:17,601 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_9/evaluation_runs.json
2024-05-01 14:58:17,601 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:58:17,609 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:58:19,974 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:58:36,845 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 14:58:36,845 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:58:36,845 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_10/evaluation_runs.json
2024-05-01 14:58:36,845 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:58:36,854 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:58:39,212 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:58:55,779 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 14:58:55,779 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:58:55,779 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_11/evaluation_runs.json
2024-05-01 14:58:55,779 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:58:55,788 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:58:57,579 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:59:13,827 - root - [INFO] - 	!!!Scores: {'accuracy': 0.515, 'average': 0.515}
2024-05-01 14:59:13,827 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:59:13,827 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_12/evaluation_runs.json
2024-05-01 14:59:13,827 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:59:13,835 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:59:16,388 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:59:32,921 - root - [INFO] - 	!!!Scores: {'accuracy': 0.511, 'average': 0.511}
2024-05-01 14:59:32,921 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:59:32,921 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_13/evaluation_runs.json
2024-05-01 14:59:32,921 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:59:32,929 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:59:35,275 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 14:59:52,649 - root - [INFO] - 	!!!Scores: {'accuracy': 0.504, 'average': 0.504}
2024-05-01 14:59:52,649 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 14:59:52,649 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_14/evaluation_runs.json
2024-05-01 14:59:52,650 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 14:59:52,657 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 14:59:54,495 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:00:10,740 - root - [INFO] - 	!!!Scores: {'accuracy': 0.515, 'average': 0.515}
2024-05-01 15:00:10,740 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:00:10,740 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_0/evaluation_runs.json
2024-05-01 15:00:10,741 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:00:11,718 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-5854c73e8e2a58bf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 15:00:11,777 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:00:13,931 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:0	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:00:39,234 - root - [INFO] - 	!!!Scores: {'accuracy': 0.501, 'average': 0.501}
2024-05-01 15:00:39,234 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:00:39,234 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_1/evaluation_runs.json
2024-05-01 15:00:39,234 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:00:39,242 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:00:41,444 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:1	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:01:04,303 - root - [INFO] - 	!!!Scores: {'accuracy': 0.504, 'average': 0.504}
2024-05-01 15:01:04,304 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:01:04,304 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_2/evaluation_runs.json
2024-05-01 15:01:04,304 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:01:04,312 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:01:06,485 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:2	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:01:28,980 - root - [INFO] - 	!!!Scores: {'accuracy': 0.491, 'average': 0.491}
2024-05-01 15:01:28,980 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:01:28,980 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_3/evaluation_runs.json
2024-05-01 15:01:28,980 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:01:28,989 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:01:31,156 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:3	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:01:53,812 - root - [INFO] - 	!!!Scores: {'accuracy': 0.509, 'average': 0.509}
2024-05-01 15:01:53,812 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:01:53,812 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_4/evaluation_runs.json
2024-05-01 15:01:53,813 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:01:53,820 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:01:55,976 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:4	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:02:19,213 - root - [INFO] - 	!!!Scores: {'accuracy': 0.501, 'average': 0.501}
2024-05-01 15:02:19,213 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:02:19,213 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_5/evaluation_runs.json
2024-05-01 15:02:19,213 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:02:19,221 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:02:21,396 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:5	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:02:44,135 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 15:02:44,136 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:02:44,136 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_6/evaluation_runs.json
2024-05-01 15:02:44,136 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:02:44,145 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:02:46,982 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:6	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:03:11,845 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 15:03:11,845 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:03:11,846 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_7/evaluation_runs.json
2024-05-01 15:03:11,846 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:03:11,853 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:03:14,055 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:7	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:03:37,214 - root - [INFO] - 	!!!Scores: {'accuracy': 0.485, 'average': 0.485}
2024-05-01 15:03:37,215 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:03:37,215 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_8/evaluation_runs.json
2024-05-01 15:03:37,215 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:03:37,223 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:03:39,430 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:8	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:04:02,407 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 15:04:02,407 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:04:02,407 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_9/evaluation_runs.json
2024-05-01 15:04:02,407 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:04:02,417 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:04:05,396 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:9	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:04:29,254 - root - [INFO] - 	!!!Scores: {'accuracy': 0.499, 'average': 0.499}
2024-05-01 15:04:29,254 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:04:29,254 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_10/evaluation_runs.json
2024-05-01 15:04:29,255 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:04:29,263 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:04:32,100 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:10	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:04:55,617 - root - [INFO] - 	!!!Scores: {'accuracy': 0.503, 'average': 0.503}
2024-05-01 15:04:55,617 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:04:55,617 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_11/evaluation_runs.json
2024-05-01 15:04:55,617 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:04:55,625 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:04:57,787 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:11	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:05:20,819 - root - [INFO] - 	!!!Scores: {'accuracy': 0.505, 'average': 0.505}
2024-05-01 15:05:20,819 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:05:20,819 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_12/evaluation_runs.json
2024-05-01 15:05:20,819 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:05:20,827 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:05:23,677 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:12	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:05:47,215 - root - [INFO] - 	!!!Scores: {'accuracy': 0.521, 'average': 0.521}
2024-05-01 15:05:47,216 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:05:47,216 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_13/evaluation_runs.json
2024-05-01 15:05:47,216 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:05:47,225 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:05:50,075 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:13	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:06:14,634 - root - [INFO] - 	!!!Scores: {'accuracy': 0.51, 'average': 0.51}
2024-05-01 15:06:14,634 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:06:14,634 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_14/evaluation_runs.json
2024-05-01 15:06:14,634 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:06:14,642 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:06:16,857 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:14	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:06:39,704 - root - [INFO] - 	!!!Scores: {'accuracy': 0.507, 'average': 0.507}
2024-05-01 15:06:39,781 - root - [INFO] - Unexpected keys: []
2024-05-01 15:06:40,010 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 15:06:40,011 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_0/evaluation_runs.json
2024-05-01 15:06:40,011 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:06:40,019 - root - [INFO] - 		Loading Full Data for rte
2024-05-01 15:06:40,971 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-667c257dd17a5fbb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 15:06:40,995 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 15:06:41,511 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:0	Num Templates:10	Num Examples with Template:245
2024-05-01 15:06:47,270 - root - [INFO] - 	!!!Scores: {'accuracy': 0.804, 'average': 0.804}
2024-05-01 15:06:47,270 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 15:06:47,270 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_1/evaluation_runs.json
2024-05-01 15:06:47,270 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:06:47,278 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 15:06:47,796 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:1	Num Templates:10	Num Examples with Template:245
2024-05-01 15:06:53,248 - root - [INFO] - 	!!!Scores: {'accuracy': 0.796, 'average': 0.796}
2024-05-01 15:06:53,248 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 15:06:53,248 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_2/evaluation_runs.json
2024-05-01 15:06:53,248 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:06:53,256 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 15:06:53,773 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:2	Num Templates:10	Num Examples with Template:245
2024-05-01 15:06:59,218 - root - [INFO] - 	!!!Scores: {'accuracy': 0.824, 'average': 0.824}
2024-05-01 15:06:59,219 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 15:06:59,219 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_3/evaluation_runs.json
2024-05-01 15:06:59,219 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:06:59,226 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 15:06:59,740 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:3	Num Templates:10	Num Examples with Template:245
2024-05-01 15:07:05,137 - root - [INFO] - 	!!!Scores: {'accuracy': 0.8, 'average': 0.8}
2024-05-01 15:07:05,137 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 15:07:05,137 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_4/evaluation_runs.json
2024-05-01 15:07:05,137 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:07:05,145 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 15:07:05,659 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:4	Num Templates:10	Num Examples with Template:245
2024-05-01 15:07:11,132 - root - [INFO] - 	!!!Scores: {'accuracy': 0.82, 'average': 0.82}
2024-05-01 15:07:11,132 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 15:07:11,132 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_5/evaluation_runs.json
2024-05-01 15:07:11,132 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:07:11,140 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 15:07:11,659 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:5	Num Templates:10	Num Examples with Template:245
2024-05-01 15:07:17,132 - root - [INFO] - 	!!!Scores: {'accuracy': 0.812, 'average': 0.812}
2024-05-01 15:07:17,132 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 15:07:17,132 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_6/evaluation_runs.json
2024-05-01 15:07:17,132 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:07:17,140 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 15:07:17,659 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:6	Num Templates:10	Num Examples with Template:245
2024-05-01 15:07:23,066 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 15:07:23,067 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 15:07:23,067 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_7/evaluation_runs.json
2024-05-01 15:07:23,067 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:07:23,075 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 15:07:23,589 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:7	Num Templates:10	Num Examples with Template:245
2024-05-01 15:07:29,198 - root - [INFO] - 	!!!Scores: {'accuracy': 0.784, 'average': 0.784}
2024-05-01 15:07:29,198 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 15:07:29,198 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_8/evaluation_runs.json
2024-05-01 15:07:29,198 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:07:29,206 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 15:07:29,722 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:8	Num Templates:10	Num Examples with Template:245
2024-05-01 15:07:35,201 - root - [INFO] - 	!!!Scores: {'accuracy': 0.816, 'average': 0.816}
2024-05-01 15:07:35,201 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 15:07:35,201 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_9/evaluation_runs.json
2024-05-01 15:07:35,201 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:07:35,209 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 15:07:35,729 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:9	Num Templates:10	Num Examples with Template:245
2024-05-01 15:07:41,281 - root - [INFO] - 	!!!Scores: {'accuracy': 0.816, 'average': 0.816}
2024-05-01 15:07:41,281 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:07:41,281 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_0/evaluation_runs.json
2024-05-01 15:07:41,281 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:07:41,289 - root - [INFO] - 		Loading Full Data for cb
2024-05-01 15:07:42,034 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-c26e73467e3f215e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 15:07:42,043 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:07:42,107 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:0	Num Templates:15	Num Examples with Template:24
2024-05-01 15:07:43,554 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 15:07:43,554 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:07:43,554 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_1/evaluation_runs.json
2024-05-01 15:07:43,554 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:07:43,562 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:07:43,613 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:1	Num Templates:15	Num Examples with Template:24
2024-05-01 15:07:45,063 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 15:07:45,063 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:07:45,063 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_2/evaluation_runs.json
2024-05-01 15:07:45,063 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:07:45,071 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:07:45,135 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:2	Num Templates:15	Num Examples with Template:24
2024-05-01 15:07:46,605 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 15:07:46,605 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:07:46,605 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_3/evaluation_runs.json
2024-05-01 15:07:46,605 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:07:46,613 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:07:46,664 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:3	Num Templates:15	Num Examples with Template:24
2024-05-01 15:07:48,101 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 15:07:48,102 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:07:48,102 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_4/evaluation_runs.json
2024-05-01 15:07:48,102 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:07:48,110 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:07:48,161 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:4	Num Templates:15	Num Examples with Template:24
2024-05-01 15:07:49,605 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 15:07:49,605 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:07:49,605 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_5/evaluation_runs.json
2024-05-01 15:07:49,605 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:07:49,613 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:07:49,677 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:5	Num Templates:15	Num Examples with Template:24
2024-05-01 15:07:51,132 - root - [INFO] - 	!!!Scores: {'accuracy': 0.917, 'average': 0.917}
2024-05-01 15:07:51,132 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:07:51,132 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_6/evaluation_runs.json
2024-05-01 15:07:51,132 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:07:51,140 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:07:51,191 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:6	Num Templates:15	Num Examples with Template:24
2024-05-01 15:07:52,633 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 15:07:52,633 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:07:52,634 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_7/evaluation_runs.json
2024-05-01 15:07:52,634 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:07:52,641 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:07:52,705 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:7	Num Templates:15	Num Examples with Template:24
2024-05-01 15:07:54,160 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 15:07:54,160 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:07:54,160 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_8/evaluation_runs.json
2024-05-01 15:07:54,160 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:07:54,168 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:07:54,220 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:8	Num Templates:15	Num Examples with Template:24
2024-05-01 15:07:55,667 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 15:07:55,667 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:07:55,668 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_9/evaluation_runs.json
2024-05-01 15:07:55,668 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:07:55,676 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:07:55,727 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:9	Num Templates:15	Num Examples with Template:24
2024-05-01 15:07:57,176 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 15:07:57,176 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:07:57,177 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_10/evaluation_runs.json
2024-05-01 15:07:57,177 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:07:57,185 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:07:57,250 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:10	Num Templates:15	Num Examples with Template:24
2024-05-01 15:07:58,711 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 15:07:58,711 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:07:58,711 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_11/evaluation_runs.json
2024-05-01 15:07:58,711 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:07:58,719 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:07:58,770 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:11	Num Templates:15	Num Examples with Template:24
2024-05-01 15:08:00,218 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 15:08:00,219 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:08:00,219 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_12/evaluation_runs.json
2024-05-01 15:08:00,219 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:08:00,227 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:08:00,278 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:12	Num Templates:15	Num Examples with Template:24
2024-05-01 15:08:01,771 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 15:08:01,771 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:08:01,771 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_13/evaluation_runs.json
2024-05-01 15:08:01,772 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:08:01,779 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:08:01,831 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:13	Num Templates:15	Num Examples with Template:24
2024-05-01 15:08:03,278 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 15:08:03,278 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:08:03,278 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_14/evaluation_runs.json
2024-05-01 15:08:03,278 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:08:03,286 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:08:03,351 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:14	Num Templates:15	Num Examples with Template:24
2024-05-01 15:08:04,835 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 15:08:04,835 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 15:08:04,835 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_0/evaluation_runs.json
2024-05-01 15:08:04,836 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:08:05,565 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-3ae7a9f52719d86a/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 15:08:05,621 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 15:08:09,231 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:0	Num Templates:5	Num Examples with Template:1235
2024-05-01 15:08:18,964 - root - [INFO] - 	!!!Scores: {'accuracy': 0.679, 'average': 0.679}
2024-05-01 15:08:18,964 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 15:08:18,964 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_1/evaluation_runs.json
2024-05-01 15:08:18,964 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:08:18,972 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 15:08:22,516 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:1	Num Templates:5	Num Examples with Template:1235
2024-05-01 15:08:32,360 - root - [INFO] - 	!!!Scores: {'accuracy': 0.67, 'average': 0.67}
2024-05-01 15:08:32,360 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 15:08:32,360 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_2/evaluation_runs.json
2024-05-01 15:08:32,360 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:08:32,369 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 15:08:35,984 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:2	Num Templates:5	Num Examples with Template:1235
2024-05-01 15:08:45,914 - root - [INFO] - 	!!!Scores: {'accuracy': 0.687, 'average': 0.687}
2024-05-01 15:08:45,915 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 15:08:45,915 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_3/evaluation_runs.json
2024-05-01 15:08:45,915 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:08:45,924 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 15:08:49,587 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:3	Num Templates:5	Num Examples with Template:1235
2024-05-01 15:08:59,754 - root - [INFO] - 	!!!Scores: {'accuracy': 0.677, 'average': 0.677}
2024-05-01 15:08:59,755 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 15:08:59,755 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_4/evaluation_runs.json
2024-05-01 15:08:59,755 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:08:59,763 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 15:09:03,357 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:4	Num Templates:5	Num Examples with Template:1235
2024-05-01 15:09:13,417 - root - [INFO] - 	!!!Scores: {'accuracy': 0.674, 'average': 0.674}
2024-05-01 15:09:13,418 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 15:09:13,418 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_0/evaluation_runs.json
2024-05-01 15:09:13,418 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:09:13,426 - root - [INFO] - 		Loading Full Data for wic
2024-05-01 15:09:14,445 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-8f888d9273347dcb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 15:09:14,496 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 15:09:15,991 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:0	Num Templates:10	Num Examples with Template:606
2024-05-01 15:09:21,359 - root - [INFO] - 	!!!Scores: {'accuracy': 0.637, 'average': 0.637}
2024-05-01 15:09:21,359 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 15:09:21,359 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_1/evaluation_runs.json
2024-05-01 15:09:21,359 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:09:21,367 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 15:09:22,863 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:1	Num Templates:10	Num Examples with Template:606
2024-05-01 15:09:28,094 - root - [INFO] - 	!!!Scores: {'accuracy': 0.655, 'average': 0.655}
2024-05-01 15:09:28,094 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 15:09:28,094 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_2/evaluation_runs.json
2024-05-01 15:09:28,094 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:09:28,102 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 15:09:29,604 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:2	Num Templates:10	Num Examples with Template:606
2024-05-01 15:09:35,342 - root - [INFO] - 	!!!Scores: {'accuracy': 0.642, 'average': 0.642}
2024-05-01 15:09:35,342 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 15:09:35,343 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_3/evaluation_runs.json
2024-05-01 15:09:35,343 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:09:35,351 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 15:09:36,846 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:3	Num Templates:10	Num Examples with Template:606
2024-05-01 15:09:42,653 - root - [INFO] - 	!!!Scores: {'accuracy': 0.566, 'average': 0.566}
2024-05-01 15:09:42,654 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 15:09:42,654 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_4/evaluation_runs.json
2024-05-01 15:09:42,654 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:09:42,661 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 15:09:44,142 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:4	Num Templates:10	Num Examples with Template:606
2024-05-01 15:09:49,539 - root - [INFO] - 	!!!Scores: {'accuracy': 0.564, 'average': 0.564}
2024-05-01 15:09:49,540 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 15:09:49,540 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_5/evaluation_runs.json
2024-05-01 15:09:49,540 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:09:49,547 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 15:09:51,043 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:5	Num Templates:10	Num Examples with Template:606
2024-05-01 15:09:56,861 - root - [INFO] - 	!!!Scores: {'accuracy': 0.668, 'average': 0.668}
2024-05-01 15:09:56,861 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 15:09:56,861 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_6/evaluation_runs.json
2024-05-01 15:09:56,861 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:09:56,869 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 15:09:58,361 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:6	Num Templates:10	Num Examples with Template:606
2024-05-01 15:10:03,770 - root - [INFO] - 	!!!Scores: {'accuracy': 0.67, 'average': 0.67}
2024-05-01 15:10:03,771 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 15:10:03,771 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_7/evaluation_runs.json
2024-05-01 15:10:03,771 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:10:03,778 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 15:10:05,258 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:7	Num Templates:10	Num Examples with Template:606
2024-05-01 15:10:10,818 - root - [INFO] - 	!!!Scores: {'accuracy': 0.546, 'average': 0.546}
2024-05-01 15:10:10,818 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 15:10:10,818 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_8/evaluation_runs.json
2024-05-01 15:10:10,819 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:10:10,826 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 15:10:12,324 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:8	Num Templates:10	Num Examples with Template:606
2024-05-01 15:10:18,392 - root - [INFO] - 	!!!Scores: {'accuracy': 0.622, 'average': 0.622}
2024-05-01 15:10:18,392 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 15:10:18,392 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_9/evaluation_runs.json
2024-05-01 15:10:18,392 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:10:18,400 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 15:10:19,880 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:9	Num Templates:10	Num Examples with Template:606
2024-05-01 15:10:24,610 - root - [INFO] - 	!!!Scores: {'accuracy': 0.658, 'average': 0.658}
2024-05-01 15:10:24,611 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 15:10:24,611 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_0/evaluation_runs.json
2024-05-01 15:10:24,611 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:10:24,619 - root - [INFO] - 		Loading Full Data for wsc
2024-05-01 15:10:25,324 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-4bcfdb1f33f6e278/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 15:10:25,340 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 15:10:25,529 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:0	Num Templates:10	Num Examples with Template:72
2024-05-01 15:10:27,318 - root - [INFO] - 	!!!Scores: {'accuracy': 0.583, 'average': 0.583}
2024-05-01 15:10:27,318 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 15:10:27,318 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_1/evaluation_runs.json
2024-05-01 15:10:27,318 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:10:27,326 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 15:10:27,499 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:1	Num Templates:10	Num Examples with Template:72
2024-05-01 15:10:29,258 - root - [INFO] - 	!!!Scores: {'accuracy': 0.542, 'average': 0.542}
2024-05-01 15:10:29,258 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 15:10:29,258 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_2/evaluation_runs.json
2024-05-01 15:10:29,258 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:10:29,266 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 15:10:29,483 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:2	Num Templates:10	Num Examples with Template:72
2024-05-01 15:10:31,408 - root - [INFO] - 	!!!Scores: {'accuracy': 0.694, 'average': 0.694}
2024-05-01 15:10:31,408 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 15:10:31,408 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_3/evaluation_runs.json
2024-05-01 15:10:31,408 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:10:31,416 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 15:10:31,631 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:3	Num Templates:10	Num Examples with Template:72
2024-05-01 15:10:33,559 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 15:10:33,559 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 15:10:33,559 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_4/evaluation_runs.json
2024-05-01 15:10:33,559 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:10:33,567 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 15:10:33,749 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:4	Num Templates:10	Num Examples with Template:72
2024-05-01 15:10:35,499 - root - [INFO] - 	!!!Scores: {'accuracy': 0.556, 'average': 0.556}
2024-05-01 15:10:35,500 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 15:10:35,500 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_5/evaluation_runs.json
2024-05-01 15:10:35,500 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:10:35,507 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 15:10:35,683 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:5	Num Templates:10	Num Examples with Template:72
2024-05-01 15:10:37,481 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 15:10:37,482 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 15:10:37,482 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_6/evaluation_runs.json
2024-05-01 15:10:37,482 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:10:37,489 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 15:10:37,662 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:6	Num Templates:10	Num Examples with Template:72
2024-05-01 15:10:39,473 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 15:10:39,473 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 15:10:39,473 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_7/evaluation_runs.json
2024-05-01 15:10:39,473 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:10:39,481 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 15:10:39,754 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:7	Num Templates:10	Num Examples with Template:72
2024-05-01 15:10:41,522 - root - [INFO] - 	!!!Scores: {'accuracy': 0.514, 'average': 0.514}
2024-05-01 15:10:41,522 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 15:10:41,522 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_8/evaluation_runs.json
2024-05-01 15:10:41,523 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:10:41,530 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 15:10:41,703 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:8	Num Templates:10	Num Examples with Template:72
2024-05-01 15:10:43,501 - root - [INFO] - 	!!!Scores: {'accuracy': 0.611, 'average': 0.611}
2024-05-01 15:10:43,501 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 15:10:43,501 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_9/evaluation_runs.json
2024-05-01 15:10:43,501 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:10:43,509 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 15:10:43,798 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:9	Num Templates:10	Num Examples with Template:72
2024-05-01 15:10:45,558 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 15:10:45,558 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 15:10:45,558 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_0/evaluation_runs.json
2024-05-01 15:10:45,558 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:10:45,566 - root - [INFO] - 		Loading Full Data for copa
2024-05-01 15:10:46,295 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-75ee279101665e7b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 15:10:46,310 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 15:10:46,528 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:0	Num Templates:8	Num Examples with Template:68
2024-05-01 15:10:48,114 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 15:10:48,114 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 15:10:48,114 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_1/evaluation_runs.json
2024-05-01 15:10:48,114 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:10:48,122 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 15:10:48,330 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:1	Num Templates:8	Num Examples with Template:68
2024-05-01 15:10:49,948 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 15:10:49,948 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 15:10:49,948 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_2/evaluation_runs.json
2024-05-01 15:10:49,948 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:10:49,956 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 15:10:50,165 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:2	Num Templates:8	Num Examples with Template:68
2024-05-01 15:10:51,762 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 15:10:51,763 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 15:10:51,763 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_3/evaluation_runs.json
2024-05-01 15:10:51,763 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:10:51,770 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 15:10:51,994 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:3	Num Templates:8	Num Examples with Template:68
2024-05-01 15:10:53,526 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 15:10:53,526 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 15:10:53,526 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_4/evaluation_runs.json
2024-05-01 15:10:53,526 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:10:53,534 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 15:10:53,742 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:4	Num Templates:8	Num Examples with Template:68
2024-05-01 15:10:55,344 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 15:10:55,344 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 15:10:55,344 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_5/evaluation_runs.json
2024-05-01 15:10:55,344 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:10:55,352 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 15:10:55,562 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:5	Num Templates:8	Num Examples with Template:68
2024-05-01 15:10:57,170 - root - [INFO] - 	!!!Scores: {'accuracy': 0.868, 'average': 0.868}
2024-05-01 15:10:57,170 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 15:10:57,171 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_6/evaluation_runs.json
2024-05-01 15:10:57,171 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:10:57,178 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 15:10:57,387 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:6	Num Templates:8	Num Examples with Template:68
2024-05-01 15:10:58,949 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 15:10:58,949 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 15:10:58,949 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_7/evaluation_runs.json
2024-05-01 15:10:58,950 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:10:58,957 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 15:10:59,165 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:7	Num Templates:8	Num Examples with Template:68
2024-05-01 15:11:00,719 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 15:11:00,719 - root - [INFO] - 	Evaluating model on h-swag dataset
2024-05-01 15:11:00,719 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/h-swag_template_0/evaluation_runs.json
2024-05-01 15:11:00,720 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:11:00,732 - root - [INFO] - 		Loading Full Data for h-swag
2024-05-01 15:11:01,458 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-eac19eb96b26d7b6/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 15:11:02,143 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Selected Examples: 10010	Num Total Example:10010
2024-05-01 15:11:24,464 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Num Selected Example with Templates:10010	Template Idx:0	Num Templates:1	Num Examples with Template:10010
2024-05-01 15:16:38,889 - root - [INFO] - 	!!!Scores: {'accuracy': 0.427, 'average': 0.427}
2024-05-01 15:16:38,890 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 15:16:38,890 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_0/evaluation_runs.json
2024-05-01 15:16:38,890 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:16:39,629 - datasets.builder - [WARNING] - Found cached dataset arrow (/home/guodong/.cache/huggingface/datasets/arrow/default-2796e0ea53bfdf30/0.0.0/74f69db2c14c2860059d39860b1f400a03d11bf7fb5a8258ca38c501c878c137)
2024-05-01 15:16:39,744 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 15:16:45,822 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:0	Num Templates:5	Num Examples with Template:1839
2024-05-01 15:17:10,347 - root - [INFO] - 	!!!Scores: {'accuracy': 0.928, 'average': 0.928}
2024-05-01 15:17:10,347 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 15:17:10,347 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_1/evaluation_runs.json
2024-05-01 15:17:10,348 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:17:10,357 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 15:17:16,455 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:1	Num Templates:5	Num Examples with Template:1839
2024-05-01 15:17:41,378 - root - [INFO] - 	!!!Scores: {'accuracy': 0.93, 'average': 0.93}
2024-05-01 15:17:41,378 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 15:17:41,378 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_2/evaluation_runs.json
2024-05-01 15:17:41,378 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:17:41,388 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 15:17:47,418 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:2	Num Templates:5	Num Examples with Template:1839
2024-05-01 15:18:12,159 - root - [INFO] - 	!!!Scores: {'accuracy': 0.934, 'average': 0.934}
2024-05-01 15:18:12,159 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 15:18:12,159 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_3/evaluation_runs.json
2024-05-01 15:18:12,160 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:18:12,169 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 15:18:18,225 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:3	Num Templates:5	Num Examples with Template:1839
2024-05-01 15:18:43,006 - root - [INFO] - 	!!!Scores: {'accuracy': 0.924, 'average': 0.924}
2024-05-01 15:18:43,006 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 15:18:43,006 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_4/evaluation_runs.json
2024-05-01 15:18:43,007 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:18:43,016 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 15:18:49,067 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:4	Num Templates:5	Num Examples with Template:1839
2024-05-01 15:19:14,536 - root - [INFO] - 	!!!Scores: {'accuracy': 0.925, 'average': 0.925}
2024-05-01 15:19:14,536 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:19:14,536 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_0/evaluation_runs.json
2024-05-01 15:19:14,536 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:19:15,296 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-9d580cad42f0d988/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 15:19:15,348 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:19:17,160 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:19:35,696 - root - [INFO] - 	!!!Scores: {'accuracy': 0.633, 'average': 0.633}
2024-05-01 15:19:35,697 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:19:35,697 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_1/evaluation_runs.json
2024-05-01 15:19:35,697 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:19:35,706 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:19:37,562 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:19:54,140 - root - [INFO] - 	!!!Scores: {'accuracy': 0.679, 'average': 0.679}
2024-05-01 15:19:54,140 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:19:54,140 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_2/evaluation_runs.json
2024-05-01 15:19:54,140 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:19:54,149 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:19:55,978 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:20:12,273 - root - [INFO] - 	!!!Scores: {'accuracy': 0.68, 'average': 0.68}
2024-05-01 15:20:12,273 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:20:12,273 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_3/evaluation_runs.json
2024-05-01 15:20:12,273 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:20:12,282 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:20:14,097 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:20:30,560 - root - [INFO] - 	!!!Scores: {'accuracy': 0.67, 'average': 0.67}
2024-05-01 15:20:30,560 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:20:30,560 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_4/evaluation_runs.json
2024-05-01 15:20:30,560 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:20:30,569 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:20:32,380 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:20:49,205 - root - [INFO] - 	!!!Scores: {'accuracy': 0.683, 'average': 0.683}
2024-05-01 15:20:49,206 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:20:49,206 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_5/evaluation_runs.json
2024-05-01 15:20:49,206 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:20:49,215 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:20:51,045 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:21:07,556 - root - [INFO] - 	!!!Scores: {'accuracy': 0.678, 'average': 0.678}
2024-05-01 15:21:07,556 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:21:07,556 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_6/evaluation_runs.json
2024-05-01 15:21:07,556 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:21:07,565 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:21:09,959 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:21:28,197 - root - [INFO] - 	!!!Scores: {'accuracy': 0.665, 'average': 0.665}
2024-05-01 15:21:28,197 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:21:28,197 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_7/evaluation_runs.json
2024-05-01 15:21:28,197 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:21:28,207 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:21:30,060 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:21:46,877 - root - [INFO] - 	!!!Scores: {'accuracy': 0.667, 'average': 0.667}
2024-05-01 15:21:46,877 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:21:46,877 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_8/evaluation_runs.json
2024-05-01 15:21:46,877 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:21:46,886 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:21:48,737 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:22:05,362 - root - [INFO] - 	!!!Scores: {'accuracy': 0.676, 'average': 0.676}
2024-05-01 15:22:05,362 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:22:05,362 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_9/evaluation_runs.json
2024-05-01 15:22:05,363 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:22:05,371 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:22:07,760 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:22:25,063 - root - [INFO] - 	!!!Scores: {'accuracy': 0.62, 'average': 0.62}
2024-05-01 15:22:25,063 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:22:25,063 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_10/evaluation_runs.json
2024-05-01 15:22:25,063 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:22:25,073 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:22:27,452 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:22:44,389 - root - [INFO] - 	!!!Scores: {'accuracy': 0.659, 'average': 0.659}
2024-05-01 15:22:44,389 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:22:44,389 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_11/evaluation_runs.json
2024-05-01 15:22:44,389 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:22:44,397 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:22:46,205 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:23:02,859 - root - [INFO] - 	!!!Scores: {'accuracy': 0.679, 'average': 0.679}
2024-05-01 15:23:02,859 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:23:02,860 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_12/evaluation_runs.json
2024-05-01 15:23:02,860 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:23:02,869 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:23:05,454 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:23:22,351 - root - [INFO] - 	!!!Scores: {'accuracy': 0.638, 'average': 0.638}
2024-05-01 15:23:22,351 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:23:22,351 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_13/evaluation_runs.json
2024-05-01 15:23:22,351 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:23:22,360 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:23:24,719 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:23:42,465 - root - [INFO] - 	!!!Scores: {'accuracy': 0.661, 'average': 0.661}
2024-05-01 15:23:42,465 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:23:42,465 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_14/evaluation_runs.json
2024-05-01 15:23:42,465 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:23:42,473 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:23:44,333 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:24:00,888 - root - [INFO] - 	!!!Scores: {'accuracy': 0.684, 'average': 0.684}
2024-05-01 15:24:00,888 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:24:00,888 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_0/evaluation_runs.json
2024-05-01 15:24:00,888 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:24:01,888 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-2f5e309763d7971f/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 15:24:01,939 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:24:03,757 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:24:21,687 - root - [INFO] - 	!!!Scores: {'accuracy': 0.499, 'average': 0.499}
2024-05-01 15:24:21,687 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:24:21,687 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_1/evaluation_runs.json
2024-05-01 15:24:21,687 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:24:21,695 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:24:23,550 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:24:39,805 - root - [INFO] - 	!!!Scores: {'accuracy': 0.519, 'average': 0.519}
2024-05-01 15:24:39,805 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:24:39,805 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_2/evaluation_runs.json
2024-05-01 15:24:39,805 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:24:39,813 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:24:41,647 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:24:57,515 - root - [INFO] - 	!!!Scores: {'accuracy': 0.518, 'average': 0.518}
2024-05-01 15:24:57,515 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:24:57,515 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_3/evaluation_runs.json
2024-05-01 15:24:57,516 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:24:57,525 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:24:59,330 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:25:15,387 - root - [INFO] - 	!!!Scores: {'accuracy': 0.493, 'average': 0.493}
2024-05-01 15:25:15,387 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:25:15,387 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_4/evaluation_runs.json
2024-05-01 15:25:15,388 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:25:15,395 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:25:17,198 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:25:33,678 - root - [INFO] - 	!!!Scores: {'accuracy': 0.514, 'average': 0.514}
2024-05-01 15:25:33,678 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:25:33,678 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_5/evaluation_runs.json
2024-05-01 15:25:33,678 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:25:33,686 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:25:35,508 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:25:51,657 - root - [INFO] - 	!!!Scores: {'accuracy': 0.519, 'average': 0.519}
2024-05-01 15:25:51,657 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:25:51,657 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_6/evaluation_runs.json
2024-05-01 15:25:51,657 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:25:51,665 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:25:54,049 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:26:11,708 - root - [INFO] - 	!!!Scores: {'accuracy': 0.505, 'average': 0.505}
2024-05-01 15:26:11,709 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:26:11,709 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_7/evaluation_runs.json
2024-05-01 15:26:11,709 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:26:11,718 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:26:13,562 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:26:29,965 - root - [INFO] - 	!!!Scores: {'accuracy': 0.507, 'average': 0.507}
2024-05-01 15:26:29,965 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:26:29,965 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_8/evaluation_runs.json
2024-05-01 15:26:29,965 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:26:29,973 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:26:31,813 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:26:48,054 - root - [INFO] - 	!!!Scores: {'accuracy': 0.508, 'average': 0.508}
2024-05-01 15:26:48,054 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:26:48,054 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_9/evaluation_runs.json
2024-05-01 15:26:48,054 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:26:48,062 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:26:50,442 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:27:07,308 - root - [INFO] - 	!!!Scores: {'accuracy': 0.501, 'average': 0.501}
2024-05-01 15:27:07,308 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:27:07,308 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_10/evaluation_runs.json
2024-05-01 15:27:07,308 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:27:07,316 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:27:09,686 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:27:26,256 - root - [INFO] - 	!!!Scores: {'accuracy': 0.492, 'average': 0.492}
2024-05-01 15:27:26,256 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:27:26,256 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_11/evaluation_runs.json
2024-05-01 15:27:26,256 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:27:26,265 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:27:28,062 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:27:44,302 - root - [INFO] - 	!!!Scores: {'accuracy': 0.517, 'average': 0.517}
2024-05-01 15:27:44,302 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:27:44,302 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_12/evaluation_runs.json
2024-05-01 15:27:44,303 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:27:44,311 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:27:46,664 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:28:03,204 - root - [INFO] - 	!!!Scores: {'accuracy': 0.494, 'average': 0.494}
2024-05-01 15:28:03,204 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:28:03,204 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_13/evaluation_runs.json
2024-05-01 15:28:03,205 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:28:03,212 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:28:05,557 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:28:22,926 - root - [INFO] - 	!!!Scores: {'accuracy': 0.506, 'average': 0.506}
2024-05-01 15:28:22,926 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:28:22,926 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_14/evaluation_runs.json
2024-05-01 15:28:22,926 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:28:22,934 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:28:24,770 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:28:41,000 - root - [INFO] - 	!!!Scores: {'accuracy': 0.516, 'average': 0.516}
2024-05-01 15:28:41,000 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:28:41,001 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_0/evaluation_runs.json
2024-05-01 15:28:41,001 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:28:41,980 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-5854c73e8e2a58bf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 15:28:42,039 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:28:44,197 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:0	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:29:09,493 - root - [INFO] - 	!!!Scores: {'accuracy': 0.488, 'average': 0.488}
2024-05-01 15:29:09,494 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:29:09,494 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_1/evaluation_runs.json
2024-05-01 15:29:09,494 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:29:09,502 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:29:11,708 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:1	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:29:34,564 - root - [INFO] - 	!!!Scores: {'accuracy': 0.506, 'average': 0.506}
2024-05-01 15:29:34,564 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:29:34,564 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_2/evaluation_runs.json
2024-05-01 15:29:34,564 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:29:34,573 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:29:36,749 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:2	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:29:59,222 - root - [INFO] - 	!!!Scores: {'accuracy': 0.474, 'average': 0.474}
2024-05-01 15:29:59,222 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:29:59,222 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_3/evaluation_runs.json
2024-05-01 15:29:59,222 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:29:59,230 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:30:01,388 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:3	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:30:24,061 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 15:30:24,061 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:30:24,061 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_4/evaluation_runs.json
2024-05-01 15:30:24,061 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:30:24,069 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:30:26,224 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:4	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:30:49,462 - root - [INFO] - 	!!!Scores: {'accuracy': 0.492, 'average': 0.492}
2024-05-01 15:30:49,463 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:30:49,463 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_5/evaluation_runs.json
2024-05-01 15:30:49,463 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:30:49,471 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:30:51,647 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:5	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:31:14,399 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 15:31:14,399 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:31:14,399 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_6/evaluation_runs.json
2024-05-01 15:31:14,399 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:31:14,407 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:31:17,252 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:6	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:31:42,120 - root - [INFO] - 	!!!Scores: {'accuracy': 0.486, 'average': 0.486}
2024-05-01 15:31:42,120 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:31:42,120 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_7/evaluation_runs.json
2024-05-01 15:31:42,121 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:31:42,129 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:31:44,334 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:7	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:32:07,474 - root - [INFO] - 	!!!Scores: {'accuracy': 0.484, 'average': 0.484}
2024-05-01 15:32:07,474 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:32:07,474 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_8/evaluation_runs.json
2024-05-01 15:32:07,474 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:32:07,482 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:32:09,686 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:8	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:32:32,523 - root - [INFO] - 	!!!Scores: {'accuracy': 0.482, 'average': 0.482}
2024-05-01 15:32:32,524 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:32:32,524 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_9/evaluation_runs.json
2024-05-01 15:32:32,524 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:32:32,532 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:32:35,374 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:9	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:32:59,108 - root - [INFO] - 	!!!Scores: {'accuracy': 0.481, 'average': 0.481}
2024-05-01 15:32:59,109 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:32:59,109 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_10/evaluation_runs.json
2024-05-01 15:32:59,109 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:32:59,116 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:33:01,951 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:10	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:33:25,316 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 15:33:25,317 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:33:25,317 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_11/evaluation_runs.json
2024-05-01 15:33:25,317 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:33:25,325 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:33:27,479 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:11	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:33:50,327 - root - [INFO] - 	!!!Scores: {'accuracy': 0.487, 'average': 0.487}
2024-05-01 15:33:50,327 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:33:50,327 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_12/evaluation_runs.json
2024-05-01 15:33:50,327 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:33:50,336 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:33:53,155 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:12	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:34:16,528 - root - [INFO] - 	!!!Scores: {'accuracy': 0.516, 'average': 0.516}
2024-05-01 15:34:16,528 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:34:16,528 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_13/evaluation_runs.json
2024-05-01 15:34:16,528 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:34:16,536 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:34:19,345 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:13	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:34:43,733 - root - [INFO] - 	!!!Scores: {'accuracy': 0.499, 'average': 0.499}
2024-05-01 15:34:43,733 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:34:43,734 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_14/evaluation_runs.json
2024-05-01 15:34:43,734 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:34:43,741 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:34:45,946 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:14	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:35:08,793 - root - [INFO] - 	!!!Scores: {'accuracy': 0.487, 'average': 0.487}
2024-05-01 15:35:08,871 - root - [INFO] - Unexpected keys: []
2024-05-01 15:35:09,116 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 15:35:09,117 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_0/evaluation_runs.json
2024-05-01 15:35:09,117 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:35:09,125 - root - [INFO] - 		Loading Full Data for rte
2024-05-01 15:35:10,076 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-667c257dd17a5fbb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 15:35:10,103 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 15:35:10,620 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:0	Num Templates:10	Num Examples with Template:245
2024-05-01 15:35:16,378 - root - [INFO] - 	!!!Scores: {'accuracy': 0.812, 'average': 0.812}
2024-05-01 15:35:16,378 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 15:35:16,378 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_1/evaluation_runs.json
2024-05-01 15:35:16,378 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:35:16,386 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 15:35:16,903 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:1	Num Templates:10	Num Examples with Template:245
2024-05-01 15:35:22,349 - root - [INFO] - 	!!!Scores: {'accuracy': 0.796, 'average': 0.796}
2024-05-01 15:35:22,350 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 15:35:22,350 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_2/evaluation_runs.json
2024-05-01 15:35:22,350 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:35:22,357 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 15:35:22,874 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:2	Num Templates:10	Num Examples with Template:245
2024-05-01 15:35:28,319 - root - [INFO] - 	!!!Scores: {'accuracy': 0.812, 'average': 0.812}
2024-05-01 15:35:28,319 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 15:35:28,319 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_3/evaluation_runs.json
2024-05-01 15:35:28,319 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:35:28,327 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 15:35:28,840 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:3	Num Templates:10	Num Examples with Template:245
2024-05-01 15:35:34,214 - root - [INFO] - 	!!!Scores: {'accuracy': 0.788, 'average': 0.788}
2024-05-01 15:35:34,214 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 15:35:34,214 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_4/evaluation_runs.json
2024-05-01 15:35:34,214 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:35:34,222 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 15:35:34,735 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:4	Num Templates:10	Num Examples with Template:245
2024-05-01 15:35:40,181 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 15:35:40,182 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 15:35:40,182 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_5/evaluation_runs.json
2024-05-01 15:35:40,182 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:35:40,189 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 15:35:40,706 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:5	Num Templates:10	Num Examples with Template:245
2024-05-01 15:35:46,155 - root - [INFO] - 	!!!Scores: {'accuracy': 0.8, 'average': 0.8}
2024-05-01 15:35:46,155 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 15:35:46,155 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_6/evaluation_runs.json
2024-05-01 15:35:46,155 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:35:46,163 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 15:35:46,680 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:6	Num Templates:10	Num Examples with Template:245
2024-05-01 15:35:52,067 - root - [INFO] - 	!!!Scores: {'accuracy': 0.78, 'average': 0.78}
2024-05-01 15:35:52,067 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 15:35:52,067 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_7/evaluation_runs.json
2024-05-01 15:35:52,067 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:35:52,075 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 15:35:52,587 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:7	Num Templates:10	Num Examples with Template:245
2024-05-01 15:35:58,169 - root - [INFO] - 	!!!Scores: {'accuracy': 0.776, 'average': 0.776}
2024-05-01 15:35:58,169 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 15:35:58,169 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_8/evaluation_runs.json
2024-05-01 15:35:58,169 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:35:58,177 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 15:35:58,689 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:8	Num Templates:10	Num Examples with Template:245
2024-05-01 15:36:04,127 - root - [INFO] - 	!!!Scores: {'accuracy': 0.8, 'average': 0.8}
2024-05-01 15:36:04,128 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 15:36:04,128 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_9/evaluation_runs.json
2024-05-01 15:36:04,128 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:36:04,135 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 15:36:04,652 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:9	Num Templates:10	Num Examples with Template:245
2024-05-01 15:36:10,199 - root - [INFO] - 	!!!Scores: {'accuracy': 0.812, 'average': 0.812}
2024-05-01 15:36:10,199 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:36:10,199 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_0/evaluation_runs.json
2024-05-01 15:36:10,199 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:36:10,207 - root - [INFO] - 		Loading Full Data for cb
2024-05-01 15:36:10,933 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-c26e73467e3f215e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 15:36:10,943 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:36:11,008 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:0	Num Templates:15	Num Examples with Template:24
2024-05-01 15:36:12,453 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 15:36:12,453 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:36:12,453 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_1/evaluation_runs.json
2024-05-01 15:36:12,453 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:36:12,461 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:36:12,512 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:1	Num Templates:15	Num Examples with Template:24
2024-05-01 15:36:13,957 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 15:36:13,957 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:36:13,957 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_2/evaluation_runs.json
2024-05-01 15:36:13,958 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:36:13,965 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:36:14,029 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:2	Num Templates:15	Num Examples with Template:24
2024-05-01 15:36:15,494 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 15:36:15,494 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:36:15,495 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_3/evaluation_runs.json
2024-05-01 15:36:15,495 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:36:15,502 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:36:15,553 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:3	Num Templates:15	Num Examples with Template:24
2024-05-01 15:36:16,988 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 15:36:16,989 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:36:16,989 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_4/evaluation_runs.json
2024-05-01 15:36:16,989 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:36:16,996 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:36:17,047 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:4	Num Templates:15	Num Examples with Template:24
2024-05-01 15:36:18,488 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 15:36:18,488 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:36:18,488 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_5/evaluation_runs.json
2024-05-01 15:36:18,488 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:36:18,496 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:36:18,560 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:5	Num Templates:15	Num Examples with Template:24
2024-05-01 15:36:20,016 - root - [INFO] - 	!!!Scores: {'accuracy': 0.917, 'average': 0.917}
2024-05-01 15:36:20,016 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:36:20,016 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_6/evaluation_runs.json
2024-05-01 15:36:20,017 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:36:20,024 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:36:20,076 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:6	Num Templates:15	Num Examples with Template:24
2024-05-01 15:36:21,517 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 15:36:21,517 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:36:21,517 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_7/evaluation_runs.json
2024-05-01 15:36:21,517 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:36:21,525 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:36:21,589 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:7	Num Templates:15	Num Examples with Template:24
2024-05-01 15:36:23,040 - root - [INFO] - 	!!!Scores: {'accuracy': 0.75, 'average': 0.75}
2024-05-01 15:36:23,040 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:36:23,040 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_8/evaluation_runs.json
2024-05-01 15:36:23,040 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:36:23,048 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:36:23,099 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:8	Num Templates:15	Num Examples with Template:24
2024-05-01 15:36:24,543 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 15:36:24,543 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:36:24,543 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_9/evaluation_runs.json
2024-05-01 15:36:24,543 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:36:24,551 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:36:24,602 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:9	Num Templates:15	Num Examples with Template:24
2024-05-01 15:36:26,049 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 15:36:26,049 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:36:26,049 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_10/evaluation_runs.json
2024-05-01 15:36:26,049 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:36:26,057 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:36:26,121 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:10	Num Templates:15	Num Examples with Template:24
2024-05-01 15:36:27,578 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 15:36:27,578 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:36:27,578 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_11/evaluation_runs.json
2024-05-01 15:36:27,578 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:36:27,587 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:36:27,637 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:11	Num Templates:15	Num Examples with Template:24
2024-05-01 15:36:29,082 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 15:36:29,082 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:36:29,082 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_12/evaluation_runs.json
2024-05-01 15:36:29,082 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:36:29,090 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:36:29,141 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:12	Num Templates:15	Num Examples with Template:24
2024-05-01 15:36:30,629 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 15:36:30,629 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:36:30,629 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_13/evaluation_runs.json
2024-05-01 15:36:30,629 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:36:30,637 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:36:30,688 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:13	Num Templates:15	Num Examples with Template:24
2024-05-01 15:36:32,131 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 15:36:32,131 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 15:36:32,132 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_14/evaluation_runs.json
2024-05-01 15:36:32,132 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:36:32,139 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 15:36:32,204 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:14	Num Templates:15	Num Examples with Template:24
2024-05-01 15:36:33,684 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 15:36:33,684 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 15:36:33,684 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_0/evaluation_runs.json
2024-05-01 15:36:33,684 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:36:34,432 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-3ae7a9f52719d86a/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 15:36:34,486 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 15:36:38,101 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:0	Num Templates:5	Num Examples with Template:1235
2024-05-01 15:36:47,730 - root - [INFO] - 	!!!Scores: {'accuracy': 0.679, 'average': 0.679}
2024-05-01 15:36:47,731 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 15:36:47,731 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_1/evaluation_runs.json
2024-05-01 15:36:47,731 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:36:47,739 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 15:36:51,269 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:1	Num Templates:5	Num Examples with Template:1235
2024-05-01 15:37:00,976 - root - [INFO] - 	!!!Scores: {'accuracy': 0.67, 'average': 0.67}
2024-05-01 15:37:00,976 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 15:37:00,976 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_2/evaluation_runs.json
2024-05-01 15:37:00,976 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:37:00,984 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 15:37:04,710 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:2	Num Templates:5	Num Examples with Template:1235
2024-05-01 15:37:14,484 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 15:37:14,484 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 15:37:14,484 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_3/evaluation_runs.json
2024-05-01 15:37:14,484 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:37:14,492 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 15:37:18,120 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:3	Num Templates:5	Num Examples with Template:1235
2024-05-01 15:37:28,211 - root - [INFO] - 	!!!Scores: {'accuracy': 0.674, 'average': 0.674}
2024-05-01 15:37:28,212 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 15:37:28,212 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_4/evaluation_runs.json
2024-05-01 15:37:28,212 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:37:28,219 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 15:37:31,829 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:4	Num Templates:5	Num Examples with Template:1235
2024-05-01 15:37:41,804 - root - [INFO] - 	!!!Scores: {'accuracy': 0.674, 'average': 0.674}
2024-05-01 15:37:41,805 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 15:37:41,805 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_0/evaluation_runs.json
2024-05-01 15:37:41,805 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:37:41,812 - root - [INFO] - 		Loading Full Data for wic
2024-05-01 15:37:42,540 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-8f888d9273347dcb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 15:37:42,593 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 15:37:44,088 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:0	Num Templates:10	Num Examples with Template:606
2024-05-01 15:37:49,412 - root - [INFO] - 	!!!Scores: {'accuracy': 0.642, 'average': 0.642}
2024-05-01 15:37:49,412 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 15:37:49,412 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_1/evaluation_runs.json
2024-05-01 15:37:49,412 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:37:49,420 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 15:37:50,907 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:1	Num Templates:10	Num Examples with Template:606
2024-05-01 15:37:56,092 - root - [INFO] - 	!!!Scores: {'accuracy': 0.657, 'average': 0.657}
2024-05-01 15:37:56,092 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 15:37:56,092 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_2/evaluation_runs.json
2024-05-01 15:37:56,092 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:37:56,100 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 15:37:57,594 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:2	Num Templates:10	Num Examples with Template:606
2024-05-01 15:38:03,293 - root - [INFO] - 	!!!Scores: {'accuracy': 0.635, 'average': 0.635}
2024-05-01 15:38:03,293 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 15:38:03,294 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_3/evaluation_runs.json
2024-05-01 15:38:03,294 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:38:03,301 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 15:38:04,794 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:3	Num Templates:10	Num Examples with Template:606
2024-05-01 15:38:10,622 - root - [INFO] - 	!!!Scores: {'accuracy': 0.546, 'average': 0.546}
2024-05-01 15:38:10,622 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 15:38:10,622 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_4/evaluation_runs.json
2024-05-01 15:38:10,622 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:38:10,630 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 15:38:12,109 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:4	Num Templates:10	Num Examples with Template:606
2024-05-01 15:38:17,519 - root - [INFO] - 	!!!Scores: {'accuracy': 0.558, 'average': 0.558}
2024-05-01 15:38:17,519 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 15:38:17,520 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_5/evaluation_runs.json
2024-05-01 15:38:17,520 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:38:17,527 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 15:38:19,021 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:5	Num Templates:10	Num Examples with Template:606
2024-05-01 15:38:24,854 - root - [INFO] - 	!!!Scores: {'accuracy': 0.655, 'average': 0.655}
2024-05-01 15:38:24,854 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 15:38:24,854 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_6/evaluation_runs.json
2024-05-01 15:38:24,854 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:38:24,862 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 15:38:26,351 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:6	Num Templates:10	Num Examples with Template:606
2024-05-01 15:38:31,763 - root - [INFO] - 	!!!Scores: {'accuracy': 0.672, 'average': 0.672}
2024-05-01 15:38:31,763 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 15:38:31,763 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_7/evaluation_runs.json
2024-05-01 15:38:31,763 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:38:31,771 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 15:38:33,259 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:7	Num Templates:10	Num Examples with Template:606
2024-05-01 15:38:38,826 - root - [INFO] - 	!!!Scores: {'accuracy': 0.523, 'average': 0.523}
2024-05-01 15:38:38,826 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 15:38:38,826 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_8/evaluation_runs.json
2024-05-01 15:38:38,826 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:38:38,834 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 15:38:40,331 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:8	Num Templates:10	Num Examples with Template:606
2024-05-01 15:38:46,410 - root - [INFO] - 	!!!Scores: {'accuracy': 0.614, 'average': 0.614}
2024-05-01 15:38:46,410 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 15:38:46,410 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_9/evaluation_runs.json
2024-05-01 15:38:46,410 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:38:46,418 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 15:38:47,890 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:9	Num Templates:10	Num Examples with Template:606
2024-05-01 15:38:52,617 - root - [INFO] - 	!!!Scores: {'accuracy': 0.649, 'average': 0.649}
2024-05-01 15:38:52,617 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 15:38:52,617 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_0/evaluation_runs.json
2024-05-01 15:38:52,617 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:38:52,625 - root - [INFO] - 		Loading Full Data for wsc
2024-05-01 15:38:54,383 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-4bcfdb1f33f6e278/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 15:38:54,399 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 15:38:54,586 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:0	Num Templates:10	Num Examples with Template:72
2024-05-01 15:38:56,379 - root - [INFO] - 	!!!Scores: {'accuracy': 0.611, 'average': 0.611}
2024-05-01 15:38:56,380 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 15:38:56,380 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_1/evaluation_runs.json
2024-05-01 15:38:56,380 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:38:56,388 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 15:38:56,559 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:1	Num Templates:10	Num Examples with Template:72
2024-05-01 15:38:58,320 - root - [INFO] - 	!!!Scores: {'accuracy': 0.569, 'average': 0.569}
2024-05-01 15:38:58,320 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 15:38:58,320 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_2/evaluation_runs.json
2024-05-01 15:38:58,320 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:38:58,328 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 15:38:58,544 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:2	Num Templates:10	Num Examples with Template:72
2024-05-01 15:39:00,466 - root - [INFO] - 	!!!Scores: {'accuracy': 0.667, 'average': 0.667}
2024-05-01 15:39:00,466 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 15:39:00,466 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_3/evaluation_runs.json
2024-05-01 15:39:00,466 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:39:00,474 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 15:39:00,691 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:3	Num Templates:10	Num Examples with Template:72
2024-05-01 15:39:02,619 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 15:39:02,619 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 15:39:02,619 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_4/evaluation_runs.json
2024-05-01 15:39:02,620 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:39:02,627 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 15:39:02,809 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:4	Num Templates:10	Num Examples with Template:72
2024-05-01 15:39:04,559 - root - [INFO] - 	!!!Scores: {'accuracy': 0.569, 'average': 0.569}
2024-05-01 15:39:04,559 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 15:39:04,559 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_5/evaluation_runs.json
2024-05-01 15:39:04,559 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:39:04,567 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 15:39:04,742 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:5	Num Templates:10	Num Examples with Template:72
2024-05-01 15:39:06,542 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 15:39:06,542 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 15:39:06,542 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_6/evaluation_runs.json
2024-05-01 15:39:06,542 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:39:06,550 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 15:39:06,722 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:6	Num Templates:10	Num Examples with Template:72
2024-05-01 15:39:08,532 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 15:39:08,532 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 15:39:08,532 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_7/evaluation_runs.json
2024-05-01 15:39:08,532 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:39:08,540 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 15:39:08,812 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:7	Num Templates:10	Num Examples with Template:72
2024-05-01 15:39:10,580 - root - [INFO] - 	!!!Scores: {'accuracy': 0.514, 'average': 0.514}
2024-05-01 15:39:10,580 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 15:39:10,581 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_8/evaluation_runs.json
2024-05-01 15:39:10,581 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:39:10,588 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 15:39:10,761 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:8	Num Templates:10	Num Examples with Template:72
2024-05-01 15:39:12,558 - root - [INFO] - 	!!!Scores: {'accuracy': 0.611, 'average': 0.611}
2024-05-01 15:39:12,558 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 15:39:12,558 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_9/evaluation_runs.json
2024-05-01 15:39:12,558 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:39:12,566 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 15:39:12,854 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:9	Num Templates:10	Num Examples with Template:72
2024-05-01 15:39:14,615 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 15:39:14,615 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 15:39:14,615 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_0/evaluation_runs.json
2024-05-01 15:39:14,615 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:39:14,623 - root - [INFO] - 		Loading Full Data for copa
2024-05-01 15:39:15,346 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-75ee279101665e7b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 15:39:15,361 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 15:39:15,576 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:0	Num Templates:8	Num Examples with Template:68
2024-05-01 15:39:17,163 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 15:39:17,163 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 15:39:17,163 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_1/evaluation_runs.json
2024-05-01 15:39:17,164 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:39:17,171 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 15:39:17,379 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:1	Num Templates:8	Num Examples with Template:68
2024-05-01 15:39:18,996 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 15:39:18,996 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 15:39:18,996 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_2/evaluation_runs.json
2024-05-01 15:39:18,996 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:39:19,003 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 15:39:19,212 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:2	Num Templates:8	Num Examples with Template:68
2024-05-01 15:39:20,807 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 15:39:20,807 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 15:39:20,807 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_3/evaluation_runs.json
2024-05-01 15:39:20,807 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:39:20,815 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 15:39:21,038 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:3	Num Templates:8	Num Examples with Template:68
2024-05-01 15:39:22,569 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 15:39:22,569 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 15:39:22,569 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_4/evaluation_runs.json
2024-05-01 15:39:22,569 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:39:22,577 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 15:39:22,784 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:4	Num Templates:8	Num Examples with Template:68
2024-05-01 15:39:24,383 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 15:39:24,383 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 15:39:24,383 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_5/evaluation_runs.json
2024-05-01 15:39:24,384 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:39:24,391 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 15:39:24,601 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:5	Num Templates:8	Num Examples with Template:68
2024-05-01 15:39:26,205 - root - [INFO] - 	!!!Scores: {'accuracy': 0.868, 'average': 0.868}
2024-05-01 15:39:26,206 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 15:39:26,206 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_6/evaluation_runs.json
2024-05-01 15:39:26,206 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:39:26,213 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 15:39:26,421 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:6	Num Templates:8	Num Examples with Template:68
2024-05-01 15:39:27,983 - root - [INFO] - 	!!!Scores: {'accuracy': 0.853, 'average': 0.853}
2024-05-01 15:39:27,983 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 15:39:27,983 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_7/evaluation_runs.json
2024-05-01 15:39:27,983 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:39:27,991 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 15:39:28,198 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:7	Num Templates:8	Num Examples with Template:68
2024-05-01 15:39:29,751 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 15:39:29,751 - root - [INFO] - 	Evaluating model on h-swag dataset
2024-05-01 15:39:29,751 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/h-swag_template_0/evaluation_runs.json
2024-05-01 15:39:29,751 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:39:29,764 - root - [INFO] - 		Loading Full Data for h-swag
2024-05-01 15:39:30,705 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-eac19eb96b26d7b6/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 15:39:31,391 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Selected Examples: 10010	Num Total Example:10010
2024-05-01 15:39:53,289 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Num Selected Example with Templates:10010	Template Idx:0	Num Templates:1	Num Examples with Template:10010
2024-05-01 15:45:06,600 - root - [INFO] - 	!!!Scores: {'accuracy': 0.426, 'average': 0.426}
2024-05-01 15:45:06,600 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 15:45:06,600 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_0/evaluation_runs.json
2024-05-01 15:45:06,600 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:45:07,344 - datasets.builder - [WARNING] - Found cached dataset arrow (/home/guodong/.cache/huggingface/datasets/arrow/default-2796e0ea53bfdf30/0.0.0/74f69db2c14c2860059d39860b1f400a03d11bf7fb5a8258ca38c501c878c137)
2024-05-01 15:45:07,455 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 15:45:13,462 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:0	Num Templates:5	Num Examples with Template:1839
2024-05-01 15:45:37,787 - root - [INFO] - 	!!!Scores: {'accuracy': 0.924, 'average': 0.924}
2024-05-01 15:45:37,787 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 15:45:37,787 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_1/evaluation_runs.json
2024-05-01 15:45:37,788 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:45:37,795 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 15:45:43,833 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:1	Num Templates:5	Num Examples with Template:1839
2024-05-01 15:46:08,566 - root - [INFO] - 	!!!Scores: {'accuracy': 0.929, 'average': 0.929}
2024-05-01 15:46:08,566 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 15:46:08,566 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_2/evaluation_runs.json
2024-05-01 15:46:08,566 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:46:08,574 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 15:46:14,561 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:2	Num Templates:5	Num Examples with Template:1839
2024-05-01 15:46:39,144 - root - [INFO] - 	!!!Scores: {'accuracy': 0.93, 'average': 0.93}
2024-05-01 15:46:39,144 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 15:46:39,144 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_3/evaluation_runs.json
2024-05-01 15:46:39,145 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:46:39,153 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 15:46:45,196 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:3	Num Templates:5	Num Examples with Template:1839
2024-05-01 15:47:09,798 - root - [INFO] - 	!!!Scores: {'accuracy': 0.921, 'average': 0.921}
2024-05-01 15:47:09,798 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 15:47:09,799 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_4/evaluation_runs.json
2024-05-01 15:47:09,799 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:47:09,808 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 15:47:15,815 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:4	Num Templates:5	Num Examples with Template:1839
2024-05-01 15:47:41,094 - root - [INFO] - 	!!!Scores: {'accuracy': 0.924, 'average': 0.924}
2024-05-01 15:47:41,094 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:47:41,094 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_0/evaluation_runs.json
2024-05-01 15:47:41,095 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:47:41,832 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-9d580cad42f0d988/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 15:47:41,883 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:47:43,686 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:48:02,175 - root - [INFO] - 	!!!Scores: {'accuracy': 0.638, 'average': 0.638}
2024-05-01 15:48:02,175 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:48:02,175 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_1/evaluation_runs.json
2024-05-01 15:48:02,175 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:48:02,183 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:48:04,031 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:48:20,570 - root - [INFO] - 	!!!Scores: {'accuracy': 0.674, 'average': 0.674}
2024-05-01 15:48:20,570 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:48:20,570 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_2/evaluation_runs.json
2024-05-01 15:48:20,570 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:48:20,578 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:48:22,394 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:48:38,644 - root - [INFO] - 	!!!Scores: {'accuracy': 0.685, 'average': 0.685}
2024-05-01 15:48:38,644 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:48:38,644 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_3/evaluation_runs.json
2024-05-01 15:48:38,644 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:48:38,652 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:48:40,453 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:48:56,894 - root - [INFO] - 	!!!Scores: {'accuracy': 0.669, 'average': 0.669}
2024-05-01 15:48:56,894 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:48:56,894 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_4/evaluation_runs.json
2024-05-01 15:48:56,894 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:48:56,902 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:48:58,701 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:49:15,485 - root - [INFO] - 	!!!Scores: {'accuracy': 0.677, 'average': 0.677}
2024-05-01 15:49:15,485 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:49:15,485 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_5/evaluation_runs.json
2024-05-01 15:49:15,485 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:49:15,494 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:49:17,314 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:49:33,793 - root - [INFO] - 	!!!Scores: {'accuracy': 0.679, 'average': 0.679}
2024-05-01 15:49:33,793 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:49:33,793 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_6/evaluation_runs.json
2024-05-01 15:49:33,793 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:49:33,802 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:49:36,182 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:49:54,359 - root - [INFO] - 	!!!Scores: {'accuracy': 0.665, 'average': 0.665}
2024-05-01 15:49:54,359 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:49:54,359 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_7/evaluation_runs.json
2024-05-01 15:49:54,359 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:49:54,367 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:49:56,220 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:50:13,008 - root - [INFO] - 	!!!Scores: {'accuracy': 0.674, 'average': 0.674}
2024-05-01 15:50:13,009 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:50:13,009 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_8/evaluation_runs.json
2024-05-01 15:50:13,009 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:50:13,017 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:50:14,858 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:50:31,496 - root - [INFO] - 	!!!Scores: {'accuracy': 0.677, 'average': 0.677}
2024-05-01 15:50:31,496 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:50:31,496 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_9/evaluation_runs.json
2024-05-01 15:50:31,497 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:50:31,505 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:50:33,886 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:50:51,183 - root - [INFO] - 	!!!Scores: {'accuracy': 0.632, 'average': 0.632}
2024-05-01 15:50:51,184 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:50:51,184 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_10/evaluation_runs.json
2024-05-01 15:50:51,184 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:50:51,192 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:50:53,563 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:51:10,517 - root - [INFO] - 	!!!Scores: {'accuracy': 0.661, 'average': 0.661}
2024-05-01 15:51:10,517 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:51:10,517 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_11/evaluation_runs.json
2024-05-01 15:51:10,518 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:51:10,526 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:51:12,328 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:51:28,999 - root - [INFO] - 	!!!Scores: {'accuracy': 0.676, 'average': 0.676}
2024-05-01 15:51:28,999 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:51:28,999 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_12/evaluation_runs.json
2024-05-01 15:51:28,999 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:51:29,007 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:51:31,370 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:51:48,300 - root - [INFO] - 	!!!Scores: {'accuracy': 0.637, 'average': 0.637}
2024-05-01 15:51:48,300 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:51:48,300 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_13/evaluation_runs.json
2024-05-01 15:51:48,300 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:51:48,309 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:51:50,652 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:52:08,422 - root - [INFO] - 	!!!Scores: {'accuracy': 0.661, 'average': 0.661}
2024-05-01 15:52:08,422 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 15:52:08,422 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_14/evaluation_runs.json
2024-05-01 15:52:08,422 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:52:08,430 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:52:10,267 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:52:26,915 - root - [INFO] - 	!!!Scores: {'accuracy': 0.674, 'average': 0.674}
2024-05-01 15:52:26,915 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:52:26,915 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_0/evaluation_runs.json
2024-05-01 15:52:26,915 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:52:27,886 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-2f5e309763d7971f/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 15:52:27,938 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:52:29,742 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:52:47,754 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 15:52:47,754 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:52:47,755 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_1/evaluation_runs.json
2024-05-01 15:52:47,755 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:52:47,763 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:52:49,598 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:53:05,958 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 15:53:05,958 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:53:05,958 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_2/evaluation_runs.json
2024-05-01 15:53:05,958 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:53:05,966 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:53:07,790 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:53:23,758 - root - [INFO] - 	!!!Scores: {'accuracy': 0.518, 'average': 0.518}
2024-05-01 15:53:23,758 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:53:23,758 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_3/evaluation_runs.json
2024-05-01 15:53:23,758 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:53:23,766 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:53:25,577 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:53:41,763 - root - [INFO] - 	!!!Scores: {'accuracy': 0.498, 'average': 0.498}
2024-05-01 15:53:41,763 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:53:41,763 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_4/evaluation_runs.json
2024-05-01 15:53:41,763 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:53:41,771 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:53:43,576 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:54:00,178 - root - [INFO] - 	!!!Scores: {'accuracy': 0.516, 'average': 0.516}
2024-05-01 15:54:00,178 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:54:00,178 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_5/evaluation_runs.json
2024-05-01 15:54:00,178 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:54:00,186 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:54:02,001 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:54:18,240 - root - [INFO] - 	!!!Scores: {'accuracy': 0.51, 'average': 0.51}
2024-05-01 15:54:18,241 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:54:18,241 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_6/evaluation_runs.json
2024-05-01 15:54:18,241 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:54:18,249 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:54:20,628 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:54:38,397 - root - [INFO] - 	!!!Scores: {'accuracy': 0.513, 'average': 0.513}
2024-05-01 15:54:38,397 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:54:38,397 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_7/evaluation_runs.json
2024-05-01 15:54:38,398 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:54:38,406 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:54:40,243 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:54:56,710 - root - [INFO] - 	!!!Scores: {'accuracy': 0.516, 'average': 0.516}
2024-05-01 15:54:56,710 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:54:56,710 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_8/evaluation_runs.json
2024-05-01 15:54:56,710 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:54:56,718 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:54:58,556 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:55:14,915 - root - [INFO] - 	!!!Scores: {'accuracy': 0.511, 'average': 0.511}
2024-05-01 15:55:14,915 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:55:14,915 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_9/evaluation_runs.json
2024-05-01 15:55:14,915 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:55:14,923 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:55:17,298 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:55:34,291 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 15:55:34,291 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:55:34,291 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_10/evaluation_runs.json
2024-05-01 15:55:34,292 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:55:34,300 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:55:36,709 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:55:53,449 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 15:55:53,449 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:55:53,449 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_11/evaluation_runs.json
2024-05-01 15:55:53,449 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:55:53,458 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:55:55,263 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:56:11,661 - root - [INFO] - 	!!!Scores: {'accuracy': 0.519, 'average': 0.519}
2024-05-01 15:56:11,662 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:56:11,662 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_12/evaluation_runs.json
2024-05-01 15:56:11,662 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:56:11,670 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:56:14,272 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:56:30,977 - root - [INFO] - 	!!!Scores: {'accuracy': 0.499, 'average': 0.499}
2024-05-01 15:56:30,977 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:56:30,977 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_13/evaluation_runs.json
2024-05-01 15:56:30,977 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:56:30,985 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:56:33,358 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:56:50,865 - root - [INFO] - 	!!!Scores: {'accuracy': 0.504, 'average': 0.504}
2024-05-01 15:56:50,865 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 15:56:50,865 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_14/evaluation_runs.json
2024-05-01 15:56:50,865 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:56:50,873 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 15:56:52,722 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 15:57:09,100 - root - [INFO] - 	!!!Scores: {'accuracy': 0.513, 'average': 0.513}
2024-05-01 15:57:09,100 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:57:09,101 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_0/evaluation_runs.json
2024-05-01 15:57:09,101 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:57:10,077 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-5854c73e8e2a58bf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 15:57:10,137 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:57:12,303 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:0	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:57:37,729 - root - [INFO] - 	!!!Scores: {'accuracy': 0.49, 'average': 0.49}
2024-05-01 15:57:37,729 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:57:37,729 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_1/evaluation_runs.json
2024-05-01 15:57:37,729 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:57:37,737 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:57:39,958 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:1	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:58:02,914 - root - [INFO] - 	!!!Scores: {'accuracy': 0.506, 'average': 0.506}
2024-05-01 15:58:02,914 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:58:02,914 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_2/evaluation_runs.json
2024-05-01 15:58:02,914 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:58:02,922 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:58:05,102 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:2	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:58:27,737 - root - [INFO] - 	!!!Scores: {'accuracy': 0.482, 'average': 0.482}
2024-05-01 15:58:27,738 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:58:27,738 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_3/evaluation_runs.json
2024-05-01 15:58:27,738 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:58:27,747 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:58:29,912 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:3	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:58:52,742 - root - [INFO] - 	!!!Scores: {'accuracy': 0.51, 'average': 0.51}
2024-05-01 15:58:52,742 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:58:52,742 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_4/evaluation_runs.json
2024-05-01 15:58:52,742 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:58:52,751 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:58:54,911 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:4	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:59:18,246 - root - [INFO] - 	!!!Scores: {'accuracy': 0.495, 'average': 0.495}
2024-05-01 15:59:18,246 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:59:18,246 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_5/evaluation_runs.json
2024-05-01 15:59:18,246 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:59:18,254 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:59:20,440 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:5	Num Templates:15	Num Examples with Template:1200
2024-05-01 15:59:43,231 - root - [INFO] - 	!!!Scores: {'accuracy': 0.504, 'average': 0.504}
2024-05-01 15:59:43,232 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 15:59:43,232 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_6/evaluation_runs.json
2024-05-01 15:59:43,232 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 15:59:43,239 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 15:59:46,088 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:6	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:00:11,062 - root - [INFO] - 	!!!Scores: {'accuracy': 0.487, 'average': 0.487}
2024-05-01 16:00:11,062 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:00:11,062 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_7/evaluation_runs.json
2024-05-01 16:00:11,063 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:00:11,072 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:00:13,289 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:7	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:00:36,575 - root - [INFO] - 	!!!Scores: {'accuracy': 0.488, 'average': 0.488}
2024-05-01 16:00:36,576 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:00:36,576 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_8/evaluation_runs.json
2024-05-01 16:00:36,576 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:00:36,584 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:00:38,789 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:8	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:01:01,806 - root - [INFO] - 	!!!Scores: {'accuracy': 0.477, 'average': 0.477}
2024-05-01 16:01:01,807 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:01:01,807 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_9/evaluation_runs.json
2024-05-01 16:01:01,807 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:01:01,815 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:01:04,669 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:9	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:01:28,491 - root - [INFO] - 	!!!Scores: {'accuracy': 0.484, 'average': 0.484}
2024-05-01 16:01:28,491 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:01:28,491 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_10/evaluation_runs.json
2024-05-01 16:01:28,491 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:01:28,500 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:01:31,350 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:10	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:01:54,860 - root - [INFO] - 	!!!Scores: {'accuracy': 0.498, 'average': 0.498}
2024-05-01 16:01:54,860 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:01:54,860 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_11/evaluation_runs.json
2024-05-01 16:01:54,860 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:01:54,870 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:01:57,041 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:11	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:02:20,070 - root - [INFO] - 	!!!Scores: {'accuracy': 0.494, 'average': 0.494}
2024-05-01 16:02:20,070 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:02:20,070 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_12/evaluation_runs.json
2024-05-01 16:02:20,070 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:02:20,078 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:02:22,931 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:12	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:02:46,504 - root - [INFO] - 	!!!Scores: {'accuracy': 0.514, 'average': 0.514}
2024-05-01 16:02:46,504 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:02:46,504 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_13/evaluation_runs.json
2024-05-01 16:02:46,504 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:02:46,512 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:02:49,363 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:13	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:03:13,945 - root - [INFO] - 	!!!Scores: {'accuracy': 0.506, 'average': 0.506}
2024-05-01 16:03:13,945 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:03:13,945 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_14/evaluation_runs.json
2024-05-01 16:03:13,945 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:03:13,953 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:03:16,194 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:14	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:03:39,251 - root - [INFO] - 	!!!Scores: {'accuracy': 0.491, 'average': 0.491}
2024-05-01 16:03:39,334 - root - [INFO] - Unexpected keys: []
2024-05-01 16:03:39,576 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 16:03:39,576 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_0/evaluation_runs.json
2024-05-01 16:03:39,576 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:03:39,585 - root - [INFO] - 		Loading Full Data for rte
2024-05-01 16:03:41,325 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-667c257dd17a5fbb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 16:03:41,349 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 16:03:41,870 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:0	Num Templates:10	Num Examples with Template:245
2024-05-01 16:03:47,655 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 16:03:47,655 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 16:03:47,655 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_1/evaluation_runs.json
2024-05-01 16:03:47,656 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:03:47,664 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 16:03:48,185 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:1	Num Templates:10	Num Examples with Template:245
2024-05-01 16:03:53,662 - root - [INFO] - 	!!!Scores: {'accuracy': 0.796, 'average': 0.796}
2024-05-01 16:03:53,662 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 16:03:53,662 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_2/evaluation_runs.json
2024-05-01 16:03:53,662 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:03:53,670 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 16:03:54,192 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:2	Num Templates:10	Num Examples with Template:245
2024-05-01 16:03:59,679 - root - [INFO] - 	!!!Scores: {'accuracy': 0.812, 'average': 0.812}
2024-05-01 16:03:59,679 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 16:03:59,679 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_3/evaluation_runs.json
2024-05-01 16:03:59,679 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:03:59,688 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 16:04:00,206 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:3	Num Templates:10	Num Examples with Template:245
2024-05-01 16:04:05,609 - root - [INFO] - 	!!!Scores: {'accuracy': 0.784, 'average': 0.784}
2024-05-01 16:04:05,609 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 16:04:05,609 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_4/evaluation_runs.json
2024-05-01 16:04:05,609 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:04:05,617 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 16:04:06,134 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:4	Num Templates:10	Num Examples with Template:245
2024-05-01 16:04:11,617 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 16:04:11,617 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 16:04:11,617 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_5/evaluation_runs.json
2024-05-01 16:04:11,617 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:04:11,626 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 16:04:12,148 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:5	Num Templates:10	Num Examples with Template:245
2024-05-01 16:04:17,627 - root - [INFO] - 	!!!Scores: {'accuracy': 0.788, 'average': 0.788}
2024-05-01 16:04:17,627 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 16:04:17,627 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_6/evaluation_runs.json
2024-05-01 16:04:17,627 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:04:17,636 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 16:04:18,159 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:6	Num Templates:10	Num Examples with Template:245
2024-05-01 16:04:23,575 - root - [INFO] - 	!!!Scores: {'accuracy': 0.776, 'average': 0.776}
2024-05-01 16:04:23,576 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 16:04:23,576 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_7/evaluation_runs.json
2024-05-01 16:04:23,576 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:04:23,584 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 16:04:24,102 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:7	Num Templates:10	Num Examples with Template:245
2024-05-01 16:04:29,711 - root - [INFO] - 	!!!Scores: {'accuracy': 0.78, 'average': 0.78}
2024-05-01 16:04:29,711 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 16:04:29,712 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_8/evaluation_runs.json
2024-05-01 16:04:29,712 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:04:29,720 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 16:04:30,236 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:8	Num Templates:10	Num Examples with Template:245
2024-05-01 16:04:35,708 - root - [INFO] - 	!!!Scores: {'accuracy': 0.788, 'average': 0.788}
2024-05-01 16:04:35,708 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 16:04:35,708 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_9/evaluation_runs.json
2024-05-01 16:04:35,708 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:04:35,717 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 16:04:36,250 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:9	Num Templates:10	Num Examples with Template:245
2024-05-01 16:04:41,817 - root - [INFO] - 	!!!Scores: {'accuracy': 0.8, 'average': 0.8}
2024-05-01 16:04:41,817 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:04:41,817 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_0/evaluation_runs.json
2024-05-01 16:04:41,817 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:04:41,825 - root - [INFO] - 		Loading Full Data for cb
2024-05-01 16:04:42,783 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-c26e73467e3f215e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 16:04:42,792 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:04:42,856 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:0	Num Templates:15	Num Examples with Template:24
2024-05-01 16:04:44,304 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 16:04:44,304 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:04:44,304 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_1/evaluation_runs.json
2024-05-01 16:04:44,304 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:04:44,312 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:04:44,363 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:1	Num Templates:15	Num Examples with Template:24
2024-05-01 16:04:45,813 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 16:04:45,813 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:04:45,813 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_2/evaluation_runs.json
2024-05-01 16:04:45,813 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:04:45,821 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:04:45,886 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:2	Num Templates:15	Num Examples with Template:24
2024-05-01 16:04:47,357 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 16:04:47,357 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:04:47,357 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_3/evaluation_runs.json
2024-05-01 16:04:47,357 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:04:47,365 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:04:47,417 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:3	Num Templates:15	Num Examples with Template:24
2024-05-01 16:04:48,856 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 16:04:48,856 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:04:48,856 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_4/evaluation_runs.json
2024-05-01 16:04:48,857 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:04:48,865 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:04:48,916 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:4	Num Templates:15	Num Examples with Template:24
2024-05-01 16:04:50,361 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 16:04:50,361 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:04:50,361 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_5/evaluation_runs.json
2024-05-01 16:04:50,361 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:04:50,369 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:04:50,437 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:5	Num Templates:15	Num Examples with Template:24
2024-05-01 16:04:51,889 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 16:04:51,889 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:04:51,889 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_6/evaluation_runs.json
2024-05-01 16:04:51,889 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:04:51,897 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:04:51,949 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:6	Num Templates:15	Num Examples with Template:24
2024-05-01 16:04:53,391 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 16:04:53,391 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:04:53,391 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_7/evaluation_runs.json
2024-05-01 16:04:53,391 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:04:53,399 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:04:53,464 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:7	Num Templates:15	Num Examples with Template:24
2024-05-01 16:04:54,917 - root - [INFO] - 	!!!Scores: {'accuracy': 0.75, 'average': 0.75}
2024-05-01 16:04:54,917 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:04:54,917 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_8/evaluation_runs.json
2024-05-01 16:04:54,917 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:04:54,925 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:04:54,977 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:8	Num Templates:15	Num Examples with Template:24
2024-05-01 16:04:56,424 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 16:04:56,424 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:04:56,424 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_9/evaluation_runs.json
2024-05-01 16:04:56,425 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:04:56,433 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:04:56,484 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:9	Num Templates:15	Num Examples with Template:24
2024-05-01 16:04:57,935 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 16:04:57,935 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:04:57,935 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_10/evaluation_runs.json
2024-05-01 16:04:57,935 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:04:57,943 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:04:58,009 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:10	Num Templates:15	Num Examples with Template:24
2024-05-01 16:04:59,469 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 16:04:59,469 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:04:59,469 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_11/evaluation_runs.json
2024-05-01 16:04:59,470 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:04:59,478 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:04:59,530 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:11	Num Templates:15	Num Examples with Template:24
2024-05-01 16:05:00,977 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 16:05:00,977 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:05:00,977 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_12/evaluation_runs.json
2024-05-01 16:05:00,977 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:05:00,985 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:05:01,037 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:12	Num Templates:15	Num Examples with Template:24
2024-05-01 16:05:02,529 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 16:05:02,529 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:05:02,529 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_13/evaluation_runs.json
2024-05-01 16:05:02,530 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:05:02,538 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:05:02,590 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:13	Num Templates:15	Num Examples with Template:24
2024-05-01 16:05:04,037 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 16:05:04,038 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:05:04,038 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_14/evaluation_runs.json
2024-05-01 16:05:04,038 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:05:04,046 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:05:04,111 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:14	Num Templates:15	Num Examples with Template:24
2024-05-01 16:05:05,594 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 16:05:05,594 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 16:05:05,594 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_0/evaluation_runs.json
2024-05-01 16:05:05,595 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:05:06,541 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-3ae7a9f52719d86a/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 16:05:06,597 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 16:05:10,228 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:0	Num Templates:5	Num Examples with Template:1235
2024-05-01 16:05:19,983 - root - [INFO] - 	!!!Scores: {'accuracy': 0.68, 'average': 0.68}
2024-05-01 16:05:19,983 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 16:05:19,983 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_1/evaluation_runs.json
2024-05-01 16:05:19,983 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:05:19,991 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 16:05:23,544 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:1	Num Templates:5	Num Examples with Template:1235
2024-05-01 16:05:33,386 - root - [INFO] - 	!!!Scores: {'accuracy': 0.674, 'average': 0.674}
2024-05-01 16:05:33,386 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 16:05:33,386 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_2/evaluation_runs.json
2024-05-01 16:05:33,386 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:05:33,394 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 16:05:37,028 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:2	Num Templates:5	Num Examples with Template:1235
2024-05-01 16:05:46,933 - root - [INFO] - 	!!!Scores: {'accuracy': 0.679, 'average': 0.679}
2024-05-01 16:05:46,933 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 16:05:46,933 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_3/evaluation_runs.json
2024-05-01 16:05:46,933 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:05:46,942 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 16:05:50,804 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:3	Num Templates:5	Num Examples with Template:1235
2024-05-01 16:06:01,045 - root - [INFO] - 	!!!Scores: {'accuracy': 0.673, 'average': 0.673}
2024-05-01 16:06:01,045 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 16:06:01,045 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_4/evaluation_runs.json
2024-05-01 16:06:01,045 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:06:01,055 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 16:06:04,672 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:4	Num Templates:5	Num Examples with Template:1235
2024-05-01 16:06:14,787 - root - [INFO] - 	!!!Scores: {'accuracy': 0.668, 'average': 0.668}
2024-05-01 16:06:14,787 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 16:06:14,788 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_0/evaluation_runs.json
2024-05-01 16:06:14,788 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:06:14,797 - root - [INFO] - 		Loading Full Data for wic
2024-05-01 16:06:15,757 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-8f888d9273347dcb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 16:06:15,809 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 16:06:17,312 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:0	Num Templates:10	Num Examples with Template:606
2024-05-01 16:06:22,700 - root - [INFO] - 	!!!Scores: {'accuracy': 0.644, 'average': 0.644}
2024-05-01 16:06:22,700 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 16:06:22,700 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_1/evaluation_runs.json
2024-05-01 16:06:22,700 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:06:22,710 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 16:06:24,213 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:1	Num Templates:10	Num Examples with Template:606
2024-05-01 16:06:29,465 - root - [INFO] - 	!!!Scores: {'accuracy': 0.649, 'average': 0.649}
2024-05-01 16:06:29,466 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 16:06:29,466 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_2/evaluation_runs.json
2024-05-01 16:06:29,466 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:06:29,475 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 16:06:30,986 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:2	Num Templates:10	Num Examples with Template:606
2024-05-01 16:06:36,748 - root - [INFO] - 	!!!Scores: {'accuracy': 0.637, 'average': 0.637}
2024-05-01 16:06:36,749 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 16:06:36,749 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_3/evaluation_runs.json
2024-05-01 16:06:36,749 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:06:36,757 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 16:06:38,263 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:3	Num Templates:10	Num Examples with Template:606
2024-05-01 16:06:44,132 - root - [INFO] - 	!!!Scores: {'accuracy': 0.538, 'average': 0.538}
2024-05-01 16:06:44,132 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 16:06:44,132 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_4/evaluation_runs.json
2024-05-01 16:06:44,132 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:06:44,142 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 16:06:45,644 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:4	Num Templates:10	Num Examples with Template:606
2024-05-01 16:06:51,109 - root - [INFO] - 	!!!Scores: {'accuracy': 0.541, 'average': 0.541}
2024-05-01 16:06:51,109 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 16:06:51,109 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_5/evaluation_runs.json
2024-05-01 16:06:51,109 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:06:51,117 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 16:06:52,647 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:5	Num Templates:10	Num Examples with Template:606
2024-05-01 16:06:58,541 - root - [INFO] - 	!!!Scores: {'accuracy': 0.647, 'average': 0.647}
2024-05-01 16:06:58,542 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 16:06:58,542 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_6/evaluation_runs.json
2024-05-01 16:06:58,542 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:06:58,551 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 16:07:00,061 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:6	Num Templates:10	Num Examples with Template:606
2024-05-01 16:07:05,531 - root - [INFO] - 	!!!Scores: {'accuracy': 0.66, 'average': 0.66}
2024-05-01 16:07:05,531 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 16:07:05,531 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_7/evaluation_runs.json
2024-05-01 16:07:05,531 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:07:05,540 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 16:07:07,034 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:7	Num Templates:10	Num Examples with Template:606
2024-05-01 16:07:12,640 - root - [INFO] - 	!!!Scores: {'accuracy': 0.518, 'average': 0.518}
2024-05-01 16:07:12,641 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 16:07:12,641 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_8/evaluation_runs.json
2024-05-01 16:07:12,641 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:07:12,654 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 16:07:14,154 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:8	Num Templates:10	Num Examples with Template:606
2024-05-01 16:07:20,257 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 16:07:20,257 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 16:07:20,258 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_9/evaluation_runs.json
2024-05-01 16:07:20,258 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:07:20,267 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 16:07:21,749 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:9	Num Templates:10	Num Examples with Template:606
2024-05-01 16:07:26,477 - root - [INFO] - 	!!!Scores: {'accuracy': 0.645, 'average': 0.645}
2024-05-01 16:07:26,477 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 16:07:26,477 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_0/evaluation_runs.json
2024-05-01 16:07:26,477 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:07:26,487 - root - [INFO] - 		Loading Full Data for wsc
2024-05-01 16:07:27,482 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-4bcfdb1f33f6e278/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 16:07:27,498 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 16:07:27,687 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:0	Num Templates:10	Num Examples with Template:72
2024-05-01 16:07:29,481 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 16:07:29,481 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 16:07:29,481 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_1/evaluation_runs.json
2024-05-01 16:07:29,482 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:07:29,489 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 16:07:29,662 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:1	Num Templates:10	Num Examples with Template:72
2024-05-01 16:07:31,426 - root - [INFO] - 	!!!Scores: {'accuracy': 0.556, 'average': 0.556}
2024-05-01 16:07:31,426 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 16:07:31,426 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_2/evaluation_runs.json
2024-05-01 16:07:31,426 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:07:31,434 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 16:07:31,651 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:2	Num Templates:10	Num Examples with Template:72
2024-05-01 16:07:33,582 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 16:07:33,583 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 16:07:33,583 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_3/evaluation_runs.json
2024-05-01 16:07:33,583 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:07:33,591 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 16:07:33,809 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:3	Num Templates:10	Num Examples with Template:72
2024-05-01 16:07:35,747 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 16:07:35,747 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 16:07:35,747 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_4/evaluation_runs.json
2024-05-01 16:07:35,747 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:07:35,756 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 16:07:35,939 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:4	Num Templates:10	Num Examples with Template:72
2024-05-01 16:07:37,695 - root - [INFO] - 	!!!Scores: {'accuracy': 0.556, 'average': 0.556}
2024-05-01 16:07:37,695 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 16:07:37,695 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_5/evaluation_runs.json
2024-05-01 16:07:37,695 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:07:37,703 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 16:07:37,878 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:5	Num Templates:10	Num Examples with Template:72
2024-05-01 16:07:39,681 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 16:07:39,681 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 16:07:39,681 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_6/evaluation_runs.json
2024-05-01 16:07:39,681 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:07:39,689 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 16:07:39,862 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:6	Num Templates:10	Num Examples with Template:72
2024-05-01 16:07:41,675 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 16:07:41,676 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 16:07:41,676 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_7/evaluation_runs.json
2024-05-01 16:07:41,676 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:07:41,683 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 16:07:41,958 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:7	Num Templates:10	Num Examples with Template:72
2024-05-01 16:07:43,734 - root - [INFO] - 	!!!Scores: {'accuracy': 0.514, 'average': 0.514}
2024-05-01 16:07:43,734 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 16:07:43,734 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_8/evaluation_runs.json
2024-05-01 16:07:43,734 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:07:43,743 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 16:07:43,917 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:8	Num Templates:10	Num Examples with Template:72
2024-05-01 16:07:45,719 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 16:07:45,719 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 16:07:45,719 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_9/evaluation_runs.json
2024-05-01 16:07:45,719 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:07:45,727 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 16:07:46,016 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:9	Num Templates:10	Num Examples with Template:72
2024-05-01 16:07:47,782 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 16:07:47,782 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 16:07:47,782 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_0/evaluation_runs.json
2024-05-01 16:07:47,782 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:07:47,790 - root - [INFO] - 		Loading Full Data for copa
2024-05-01 16:07:48,753 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-75ee279101665e7b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 16:07:48,769 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 16:07:48,986 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:0	Num Templates:8	Num Examples with Template:68
2024-05-01 16:07:50,574 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 16:07:50,574 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 16:07:50,574 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_1/evaluation_runs.json
2024-05-01 16:07:50,574 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:07:50,582 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 16:07:50,791 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:1	Num Templates:8	Num Examples with Template:68
2024-05-01 16:07:52,411 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 16:07:52,411 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 16:07:52,411 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_2/evaluation_runs.json
2024-05-01 16:07:52,411 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:07:52,419 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 16:07:52,628 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:2	Num Templates:8	Num Examples with Template:68
2024-05-01 16:07:54,232 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 16:07:54,232 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 16:07:54,232 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_3/evaluation_runs.json
2024-05-01 16:07:54,232 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:07:54,241 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 16:07:54,465 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:3	Num Templates:8	Num Examples with Template:68
2024-05-01 16:07:56,001 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 16:07:56,001 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 16:07:56,001 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_4/evaluation_runs.json
2024-05-01 16:07:56,002 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:07:56,009 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 16:07:56,217 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:4	Num Templates:8	Num Examples with Template:68
2024-05-01 16:07:57,818 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 16:07:57,818 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 16:07:57,819 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_5/evaluation_runs.json
2024-05-01 16:07:57,819 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:07:57,827 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 16:07:58,036 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:5	Num Templates:8	Num Examples with Template:68
2024-05-01 16:07:59,644 - root - [INFO] - 	!!!Scores: {'accuracy': 0.853, 'average': 0.853}
2024-05-01 16:07:59,645 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 16:07:59,645 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_6/evaluation_runs.json
2024-05-01 16:07:59,645 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:07:59,653 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 16:07:59,860 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:6	Num Templates:8	Num Examples with Template:68
2024-05-01 16:08:01,424 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 16:08:01,424 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 16:08:01,424 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_7/evaluation_runs.json
2024-05-01 16:08:01,424 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:08:01,432 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 16:08:01,640 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:7	Num Templates:8	Num Examples with Template:68
2024-05-01 16:08:03,202 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 16:08:03,202 - root - [INFO] - 	Evaluating model on h-swag dataset
2024-05-01 16:08:03,202 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/h-swag_template_0/evaluation_runs.json
2024-05-01 16:08:03,202 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:08:03,216 - root - [INFO] - 		Loading Full Data for h-swag
2024-05-01 16:08:04,175 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-eac19eb96b26d7b6/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 16:08:05,049 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Selected Examples: 10010	Num Total Example:10010
2024-05-01 16:08:27,154 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Num Selected Example with Templates:10010	Template Idx:0	Num Templates:1	Num Examples with Template:10010
2024-05-01 16:13:41,742 - root - [INFO] - 	!!!Scores: {'accuracy': 0.425, 'average': 0.425}
2024-05-01 16:13:41,742 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 16:13:41,742 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_0/evaluation_runs.json
2024-05-01 16:13:41,742 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:13:42,494 - datasets.builder - [WARNING] - Found cached dataset arrow (/home/guodong/.cache/huggingface/datasets/arrow/default-2796e0ea53bfdf30/0.0.0/74f69db2c14c2860059d39860b1f400a03d11bf7fb5a8258ca38c501c878c137)
2024-05-01 16:13:42,607 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 16:13:48,629 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:0	Num Templates:5	Num Examples with Template:1839
2024-05-01 16:14:13,251 - root - [INFO] - 	!!!Scores: {'accuracy': 0.922, 'average': 0.922}
2024-05-01 16:14:13,251 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 16:14:13,251 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_1/evaluation_runs.json
2024-05-01 16:14:13,251 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:14:13,262 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 16:14:19,406 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:1	Num Templates:5	Num Examples with Template:1839
2024-05-01 16:14:44,608 - root - [INFO] - 	!!!Scores: {'accuracy': 0.921, 'average': 0.921}
2024-05-01 16:14:44,609 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 16:14:44,609 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_2/evaluation_runs.json
2024-05-01 16:14:44,609 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:14:44,623 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 16:14:50,686 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:2	Num Templates:5	Num Examples with Template:1839
2024-05-01 16:15:15,660 - root - [INFO] - 	!!!Scores: {'accuracy': 0.927, 'average': 0.927}
2024-05-01 16:15:15,660 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 16:15:15,660 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_3/evaluation_runs.json
2024-05-01 16:15:15,660 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:15:15,671 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 16:15:21,775 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:3	Num Templates:5	Num Examples with Template:1839
2024-05-01 16:15:46,715 - root - [INFO] - 	!!!Scores: {'accuracy': 0.916, 'average': 0.916}
2024-05-01 16:15:46,716 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 16:15:46,716 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_4/evaluation_runs.json
2024-05-01 16:15:46,716 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:15:46,726 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 16:15:52,807 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:4	Num Templates:5	Num Examples with Template:1839
2024-05-01 16:16:18,428 - root - [INFO] - 	!!!Scores: {'accuracy': 0.918, 'average': 0.918}
2024-05-01 16:16:18,428 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:16:18,428 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_0/evaluation_runs.json
2024-05-01 16:16:18,429 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:16:19,411 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-9d580cad42f0d988/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 16:16:19,462 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:16:21,310 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:16:39,953 - root - [INFO] - 	!!!Scores: {'accuracy': 0.641, 'average': 0.641}
2024-05-01 16:16:39,953 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:16:39,953 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_1/evaluation_runs.json
2024-05-01 16:16:39,953 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:16:39,963 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:16:41,827 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:16:58,540 - root - [INFO] - 	!!!Scores: {'accuracy': 0.673, 'average': 0.673}
2024-05-01 16:16:58,541 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:16:58,541 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_2/evaluation_runs.json
2024-05-01 16:16:58,541 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:16:58,550 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:17:00,376 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:17:16,747 - root - [INFO] - 	!!!Scores: {'accuracy': 0.683, 'average': 0.683}
2024-05-01 16:17:16,747 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:17:16,747 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_3/evaluation_runs.json
2024-05-01 16:17:16,747 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:17:16,757 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:17:18,564 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:17:35,110 - root - [INFO] - 	!!!Scores: {'accuracy': 0.667, 'average': 0.667}
2024-05-01 16:17:35,111 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:17:35,111 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_4/evaluation_runs.json
2024-05-01 16:17:35,111 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:17:35,120 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:17:36,918 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:17:53,730 - root - [INFO] - 	!!!Scores: {'accuracy': 0.677, 'average': 0.677}
2024-05-01 16:17:53,731 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:17:53,731 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_5/evaluation_runs.json
2024-05-01 16:17:53,731 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:17:53,740 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:17:55,560 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:18:12,064 - root - [INFO] - 	!!!Scores: {'accuracy': 0.674, 'average': 0.674}
2024-05-01 16:18:12,064 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:18:12,064 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_6/evaluation_runs.json
2024-05-01 16:18:12,064 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:18:12,073 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:18:14,462 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:18:32,627 - root - [INFO] - 	!!!Scores: {'accuracy': 0.668, 'average': 0.668}
2024-05-01 16:18:32,627 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:18:32,627 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_7/evaluation_runs.json
2024-05-01 16:18:32,628 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:18:32,636 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:18:34,476 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:18:51,187 - root - [INFO] - 	!!!Scores: {'accuracy': 0.675, 'average': 0.675}
2024-05-01 16:18:51,188 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:18:51,188 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_8/evaluation_runs.json
2024-05-01 16:18:51,188 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:18:51,197 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:18:53,031 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:19:09,571 - root - [INFO] - 	!!!Scores: {'accuracy': 0.677, 'average': 0.677}
2024-05-01 16:19:09,571 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:19:09,571 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_9/evaluation_runs.json
2024-05-01 16:19:09,571 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:19:09,580 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:19:12,137 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:19:29,300 - root - [INFO] - 	!!!Scores: {'accuracy': 0.637, 'average': 0.637}
2024-05-01 16:19:29,300 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:19:29,300 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_10/evaluation_runs.json
2024-05-01 16:19:29,300 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:19:29,308 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:19:31,696 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:19:48,560 - root - [INFO] - 	!!!Scores: {'accuracy': 0.661, 'average': 0.661}
2024-05-01 16:19:48,560 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:19:48,560 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_11/evaluation_runs.json
2024-05-01 16:19:48,560 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:19:48,569 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:19:50,375 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:20:06,917 - root - [INFO] - 	!!!Scores: {'accuracy': 0.67, 'average': 0.67}
2024-05-01 16:20:06,917 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:20:06,917 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_12/evaluation_runs.json
2024-05-01 16:20:06,918 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:20:06,927 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:20:09,285 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:20:26,148 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 16:20:26,148 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:20:26,148 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_13/evaluation_runs.json
2024-05-01 16:20:26,148 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:20:26,157 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:20:28,509 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:20:46,251 - root - [INFO] - 	!!!Scores: {'accuracy': 0.663, 'average': 0.663}
2024-05-01 16:20:46,251 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:20:46,251 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_14/evaluation_runs.json
2024-05-01 16:20:46,251 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:20:46,260 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:20:48,103 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:21:04,631 - root - [INFO] - 	!!!Scores: {'accuracy': 0.67, 'average': 0.67}
2024-05-01 16:21:04,632 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:21:04,632 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_0/evaluation_runs.json
2024-05-01 16:21:04,632 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:21:05,373 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-2f5e309763d7971f/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 16:21:05,425 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:21:07,224 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:21:25,133 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 16:21:25,133 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:21:25,133 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_1/evaluation_runs.json
2024-05-01 16:21:25,133 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:21:25,141 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:21:26,979 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:21:43,217 - root - [INFO] - 	!!!Scores: {'accuracy': 0.517, 'average': 0.517}
2024-05-01 16:21:43,217 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:21:43,217 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_2/evaluation_runs.json
2024-05-01 16:21:43,217 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:21:43,226 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:21:45,039 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:22:00,900 - root - [INFO] - 	!!!Scores: {'accuracy': 0.52, 'average': 0.52}
2024-05-01 16:22:00,900 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:22:00,900 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_3/evaluation_runs.json
2024-05-01 16:22:00,900 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:22:00,908 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:22:02,708 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:22:18,750 - root - [INFO] - 	!!!Scores: {'accuracy': 0.503, 'average': 0.503}
2024-05-01 16:22:18,750 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:22:18,750 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_4/evaluation_runs.json
2024-05-01 16:22:18,751 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:22:18,758 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:22:20,552 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:22:37,005 - root - [INFO] - 	!!!Scores: {'accuracy': 0.511, 'average': 0.511}
2024-05-01 16:22:37,005 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:22:37,005 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_5/evaluation_runs.json
2024-05-01 16:22:37,005 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:22:37,013 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:22:38,827 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:22:54,929 - root - [INFO] - 	!!!Scores: {'accuracy': 0.517, 'average': 0.517}
2024-05-01 16:22:54,929 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:22:54,929 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_6/evaluation_runs.json
2024-05-01 16:22:54,930 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:22:54,939 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:22:57,310 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:23:14,954 - root - [INFO] - 	!!!Scores: {'accuracy': 0.518, 'average': 0.518}
2024-05-01 16:23:14,954 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:23:14,954 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_7/evaluation_runs.json
2024-05-01 16:23:14,954 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:23:14,962 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:23:16,798 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:23:33,182 - root - [INFO] - 	!!!Scores: {'accuracy': 0.517, 'average': 0.517}
2024-05-01 16:23:33,182 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:23:33,182 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_8/evaluation_runs.json
2024-05-01 16:23:33,182 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:23:33,190 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:23:35,025 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:23:51,252 - root - [INFO] - 	!!!Scores: {'accuracy': 0.52, 'average': 0.52}
2024-05-01 16:23:51,252 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:23:51,252 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_9/evaluation_runs.json
2024-05-01 16:23:51,252 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:23:51,260 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:23:53,632 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:24:10,491 - root - [INFO] - 	!!!Scores: {'accuracy': 0.508, 'average': 0.508}
2024-05-01 16:24:10,491 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:24:10,491 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_10/evaluation_runs.json
2024-05-01 16:24:10,491 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:24:10,500 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:24:12,865 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:24:29,437 - root - [INFO] - 	!!!Scores: {'accuracy': 0.496, 'average': 0.496}
2024-05-01 16:24:29,437 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:24:29,437 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_11/evaluation_runs.json
2024-05-01 16:24:29,437 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:24:29,445 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:24:31,242 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:24:47,470 - root - [INFO] - 	!!!Scores: {'accuracy': 0.524, 'average': 0.524}
2024-05-01 16:24:47,470 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:24:47,470 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_12/evaluation_runs.json
2024-05-01 16:24:47,471 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:24:47,478 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:24:49,834 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:25:06,369 - root - [INFO] - 	!!!Scores: {'accuracy': 0.506, 'average': 0.506}
2024-05-01 16:25:06,369 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:25:06,369 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_13/evaluation_runs.json
2024-05-01 16:25:06,369 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:25:06,377 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:25:08,718 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:25:26,081 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 16:25:26,081 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:25:26,081 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_14/evaluation_runs.json
2024-05-01 16:25:26,081 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:25:26,089 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:25:27,925 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:25:44,165 - root - [INFO] - 	!!!Scores: {'accuracy': 0.518, 'average': 0.518}
2024-05-01 16:25:44,165 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:25:44,165 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_0/evaluation_runs.json
2024-05-01 16:25:44,165 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:25:44,911 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-5854c73e8e2a58bf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 16:25:44,972 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:25:47,131 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:0	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:26:12,439 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 16:26:12,440 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:26:12,440 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_1/evaluation_runs.json
2024-05-01 16:26:12,440 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:26:12,448 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:26:14,652 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:1	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:26:37,493 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 16:26:37,493 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:26:37,493 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_2/evaluation_runs.json
2024-05-01 16:26:37,493 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:26:37,501 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:26:39,675 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:2	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:27:02,138 - root - [INFO] - 	!!!Scores: {'accuracy': 0.484, 'average': 0.484}
2024-05-01 16:27:02,138 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:27:02,138 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_3/evaluation_runs.json
2024-05-01 16:27:02,138 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:27:02,146 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:27:04,303 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:3	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:27:26,974 - root - [INFO] - 	!!!Scores: {'accuracy': 0.508, 'average': 0.508}
2024-05-01 16:27:26,975 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:27:26,975 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_4/evaluation_runs.json
2024-05-01 16:27:26,975 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:27:26,983 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:27:29,135 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:4	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:27:52,405 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 16:27:52,405 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:27:52,405 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_5/evaluation_runs.json
2024-05-01 16:27:52,406 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:27:52,414 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:27:54,591 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:5	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:28:17,334 - root - [INFO] - 	!!!Scores: {'accuracy': 0.508, 'average': 0.508}
2024-05-01 16:28:17,334 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:28:17,334 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_6/evaluation_runs.json
2024-05-01 16:28:17,334 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:28:17,342 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:28:20,186 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:6	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:28:45,037 - root - [INFO] - 	!!!Scores: {'accuracy': 0.495, 'average': 0.495}
2024-05-01 16:28:45,037 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:28:45,037 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_7/evaluation_runs.json
2024-05-01 16:28:45,037 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:28:45,045 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:28:47,246 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:7	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:29:10,379 - root - [INFO] - 	!!!Scores: {'accuracy': 0.482, 'average': 0.482}
2024-05-01 16:29:10,379 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:29:10,379 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_8/evaluation_runs.json
2024-05-01 16:29:10,379 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:29:10,387 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:29:12,588 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:8	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:29:35,438 - root - [INFO] - 	!!!Scores: {'accuracy': 0.484, 'average': 0.484}
2024-05-01 16:29:35,438 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:29:35,438 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_9/evaluation_runs.json
2024-05-01 16:29:35,438 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:29:35,446 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:29:38,293 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:9	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:30:02,003 - root - [INFO] - 	!!!Scores: {'accuracy': 0.492, 'average': 0.492}
2024-05-01 16:30:02,003 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:30:02,003 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_10/evaluation_runs.json
2024-05-01 16:30:02,003 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:30:02,012 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:30:04,847 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:10	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:30:28,218 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 16:30:28,218 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:30:28,219 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_11/evaluation_runs.json
2024-05-01 16:30:28,219 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:30:28,227 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:30:30,396 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:11	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:30:53,228 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 16:30:53,228 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:30:53,228 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_12/evaluation_runs.json
2024-05-01 16:30:53,228 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:30:53,236 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:30:56,054 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:12	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:31:19,403 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 16:31:19,403 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:31:19,403 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_13/evaluation_runs.json
2024-05-01 16:31:19,403 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:31:19,411 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:31:22,217 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:13	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:31:46,594 - root - [INFO] - 	!!!Scores: {'accuracy': 0.51, 'average': 0.51}
2024-05-01 16:31:46,594 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:31:46,594 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_14/evaluation_runs.json
2024-05-01 16:31:46,595 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:31:46,602 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:31:48,808 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:14	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:32:11,647 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 16:32:11,720 - root - [INFO] - Unexpected keys: []
2024-05-01 16:32:11,959 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 16:32:11,959 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_0/evaluation_runs.json
2024-05-01 16:32:11,959 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:32:11,967 - root - [INFO] - 		Loading Full Data for rte
2024-05-01 16:32:12,694 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-667c257dd17a5fbb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 16:32:12,718 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 16:32:13,234 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:0	Num Templates:10	Num Examples with Template:245
2024-05-01 16:32:18,987 - root - [INFO] - 	!!!Scores: {'accuracy': 0.8, 'average': 0.8}
2024-05-01 16:32:18,988 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 16:32:18,988 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_1/evaluation_runs.json
2024-05-01 16:32:18,988 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:32:18,995 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 16:32:19,513 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:1	Num Templates:10	Num Examples with Template:245
2024-05-01 16:32:24,957 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 16:32:24,957 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 16:32:24,957 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_2/evaluation_runs.json
2024-05-01 16:32:24,958 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:32:24,965 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 16:32:25,482 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:2	Num Templates:10	Num Examples with Template:245
2024-05-01 16:32:30,926 - root - [INFO] - 	!!!Scores: {'accuracy': 0.804, 'average': 0.804}
2024-05-01 16:32:30,926 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 16:32:30,927 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_3/evaluation_runs.json
2024-05-01 16:32:30,927 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:32:30,934 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 16:32:31,447 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:3	Num Templates:10	Num Examples with Template:245
2024-05-01 16:32:36,817 - root - [INFO] - 	!!!Scores: {'accuracy': 0.784, 'average': 0.784}
2024-05-01 16:32:36,818 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 16:32:36,818 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_4/evaluation_runs.json
2024-05-01 16:32:36,818 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:32:36,826 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 16:32:37,339 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:4	Num Templates:10	Num Examples with Template:245
2024-05-01 16:32:42,781 - root - [INFO] - 	!!!Scores: {'accuracy': 0.812, 'average': 0.812}
2024-05-01 16:32:42,781 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 16:32:42,781 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_5/evaluation_runs.json
2024-05-01 16:32:42,782 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:32:42,789 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 16:32:43,306 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:5	Num Templates:10	Num Examples with Template:245
2024-05-01 16:32:48,751 - root - [INFO] - 	!!!Scores: {'accuracy': 0.788, 'average': 0.788}
2024-05-01 16:32:48,751 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 16:32:48,751 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_6/evaluation_runs.json
2024-05-01 16:32:48,751 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:32:48,759 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 16:32:49,276 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:6	Num Templates:10	Num Examples with Template:245
2024-05-01 16:32:54,657 - root - [INFO] - 	!!!Scores: {'accuracy': 0.763, 'average': 0.763}
2024-05-01 16:32:54,657 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 16:32:54,657 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_7/evaluation_runs.json
2024-05-01 16:32:54,657 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:32:54,665 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 16:32:55,177 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:7	Num Templates:10	Num Examples with Template:245
2024-05-01 16:33:00,755 - root - [INFO] - 	!!!Scores: {'accuracy': 0.776, 'average': 0.776}
2024-05-01 16:33:00,755 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 16:33:00,755 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_8/evaluation_runs.json
2024-05-01 16:33:00,755 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:33:00,763 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 16:33:01,276 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:8	Num Templates:10	Num Examples with Template:245
2024-05-01 16:33:06,713 - root - [INFO] - 	!!!Scores: {'accuracy': 0.796, 'average': 0.796}
2024-05-01 16:33:06,713 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 16:33:06,713 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_9/evaluation_runs.json
2024-05-01 16:33:06,713 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:33:06,721 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 16:33:07,240 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:9	Num Templates:10	Num Examples with Template:245
2024-05-01 16:33:12,779 - root - [INFO] - 	!!!Scores: {'accuracy': 0.796, 'average': 0.796}
2024-05-01 16:33:12,779 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:33:12,779 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_0/evaluation_runs.json
2024-05-01 16:33:12,779 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:33:12,787 - root - [INFO] - 		Loading Full Data for cb
2024-05-01 16:33:13,507 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-c26e73467e3f215e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 16:33:13,516 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:33:13,580 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:0	Num Templates:15	Num Examples with Template:24
2024-05-01 16:33:15,023 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 16:33:15,024 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:33:15,024 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_1/evaluation_runs.json
2024-05-01 16:33:15,024 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:33:15,031 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:33:15,082 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:1	Num Templates:15	Num Examples with Template:24
2024-05-01 16:33:16,529 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 16:33:16,530 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:33:16,530 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_2/evaluation_runs.json
2024-05-01 16:33:16,530 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:33:16,537 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:33:16,601 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:2	Num Templates:15	Num Examples with Template:24
2024-05-01 16:33:18,069 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 16:33:18,069 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:33:18,069 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_3/evaluation_runs.json
2024-05-01 16:33:18,069 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:33:18,076 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:33:18,127 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:3	Num Templates:15	Num Examples with Template:24
2024-05-01 16:33:19,560 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 16:33:19,560 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:33:19,560 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_4/evaluation_runs.json
2024-05-01 16:33:19,561 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:33:19,568 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:33:19,618 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:4	Num Templates:15	Num Examples with Template:24
2024-05-01 16:33:21,059 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 16:33:21,059 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:33:21,059 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_5/evaluation_runs.json
2024-05-01 16:33:21,059 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:33:21,066 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:33:21,130 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:5	Num Templates:15	Num Examples with Template:24
2024-05-01 16:33:22,580 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 16:33:22,580 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:33:22,580 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_6/evaluation_runs.json
2024-05-01 16:33:22,580 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:33:22,587 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:33:22,637 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:6	Num Templates:15	Num Examples with Template:24
2024-05-01 16:33:24,076 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 16:33:24,076 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:33:24,076 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_7/evaluation_runs.json
2024-05-01 16:33:24,076 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:33:24,083 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:33:24,147 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:7	Num Templates:15	Num Examples with Template:24
2024-05-01 16:33:25,596 - root - [INFO] - 	!!!Scores: {'accuracy': 0.75, 'average': 0.75}
2024-05-01 16:33:25,596 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:33:25,596 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_8/evaluation_runs.json
2024-05-01 16:33:25,596 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:33:25,603 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:33:25,654 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:8	Num Templates:15	Num Examples with Template:24
2024-05-01 16:33:27,096 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 16:33:27,096 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:33:27,096 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_9/evaluation_runs.json
2024-05-01 16:33:27,096 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:33:27,103 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:33:27,154 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:9	Num Templates:15	Num Examples with Template:24
2024-05-01 16:33:28,598 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 16:33:28,598 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:33:28,598 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_10/evaluation_runs.json
2024-05-01 16:33:28,599 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:33:28,606 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:33:28,670 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:10	Num Templates:15	Num Examples with Template:24
2024-05-01 16:33:30,124 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 16:33:30,124 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:33:30,124 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_11/evaluation_runs.json
2024-05-01 16:33:30,124 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:33:30,131 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:33:30,182 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:11	Num Templates:15	Num Examples with Template:24
2024-05-01 16:33:31,624 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 16:33:31,624 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:33:31,624 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_12/evaluation_runs.json
2024-05-01 16:33:31,624 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:33:31,632 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:33:31,683 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:12	Num Templates:15	Num Examples with Template:24
2024-05-01 16:33:33,169 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 16:33:33,169 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:33:33,169 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_13/evaluation_runs.json
2024-05-01 16:33:33,169 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:33:33,176 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:33:33,227 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:13	Num Templates:15	Num Examples with Template:24
2024-05-01 16:33:34,670 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 16:33:34,670 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 16:33:34,670 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_14/evaluation_runs.json
2024-05-01 16:33:34,670 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:33:34,678 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 16:33:34,742 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:14	Num Templates:15	Num Examples with Template:24
2024-05-01 16:33:36,220 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 16:33:36,220 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 16:33:36,220 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_0/evaluation_runs.json
2024-05-01 16:33:36,220 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:33:36,940 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-3ae7a9f52719d86a/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 16:33:36,996 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 16:33:40,707 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:0	Num Templates:5	Num Examples with Template:1235
2024-05-01 16:33:50,337 - root - [INFO] - 	!!!Scores: {'accuracy': 0.682, 'average': 0.682}
2024-05-01 16:33:50,337 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 16:33:50,337 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_1/evaluation_runs.json
2024-05-01 16:33:50,337 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:33:50,345 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 16:33:53,876 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:1	Num Templates:5	Num Examples with Template:1235
2024-05-01 16:34:03,573 - root - [INFO] - 	!!!Scores: {'accuracy': 0.669, 'average': 0.669}
2024-05-01 16:34:03,574 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 16:34:03,574 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_2/evaluation_runs.json
2024-05-01 16:34:03,574 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:34:03,581 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 16:34:07,177 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:2	Num Templates:5	Num Examples with Template:1235
2024-05-01 16:34:16,952 - root - [INFO] - 	!!!Scores: {'accuracy': 0.676, 'average': 0.676}
2024-05-01 16:34:16,952 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 16:34:16,952 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_3/evaluation_runs.json
2024-05-01 16:34:16,952 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:34:16,960 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 16:34:20,589 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:3	Num Templates:5	Num Examples with Template:1235
2024-05-01 16:34:30,688 - root - [INFO] - 	!!!Scores: {'accuracy': 0.671, 'average': 0.671}
2024-05-01 16:34:30,689 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 16:34:30,689 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_4/evaluation_runs.json
2024-05-01 16:34:30,689 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:34:30,696 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 16:34:34,294 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:4	Num Templates:5	Num Examples with Template:1235
2024-05-01 16:34:44,267 - root - [INFO] - 	!!!Scores: {'accuracy': 0.671, 'average': 0.671}
2024-05-01 16:34:44,267 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 16:34:44,267 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_0/evaluation_runs.json
2024-05-01 16:34:44,267 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:34:44,275 - root - [INFO] - 		Loading Full Data for wic
2024-05-01 16:34:45,006 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-8f888d9273347dcb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 16:34:45,061 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 16:34:46,552 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:0	Num Templates:10	Num Examples with Template:606
2024-05-01 16:34:51,873 - root - [INFO] - 	!!!Scores: {'accuracy': 0.647, 'average': 0.647}
2024-05-01 16:34:51,873 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 16:34:51,873 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_1/evaluation_runs.json
2024-05-01 16:34:51,873 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:34:51,880 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 16:34:53,371 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:1	Num Templates:10	Num Examples with Template:606
2024-05-01 16:34:58,557 - root - [INFO] - 	!!!Scores: {'accuracy': 0.645, 'average': 0.645}
2024-05-01 16:34:58,557 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 16:34:58,557 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_2/evaluation_runs.json
2024-05-01 16:34:58,558 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:34:58,565 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 16:35:00,061 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:2	Num Templates:10	Num Examples with Template:606
2024-05-01 16:35:05,758 - root - [INFO] - 	!!!Scores: {'accuracy': 0.637, 'average': 0.637}
2024-05-01 16:35:05,759 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 16:35:05,759 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_3/evaluation_runs.json
2024-05-01 16:35:05,759 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:35:05,767 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 16:35:07,259 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:3	Num Templates:10	Num Examples with Template:606
2024-05-01 16:35:13,085 - root - [INFO] - 	!!!Scores: {'accuracy': 0.53, 'average': 0.53}
2024-05-01 16:35:13,085 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 16:35:13,085 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_4/evaluation_runs.json
2024-05-01 16:35:13,085 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:35:13,093 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 16:35:14,570 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:4	Num Templates:10	Num Examples with Template:606
2024-05-01 16:35:19,976 - root - [INFO] - 	!!!Scores: {'accuracy': 0.536, 'average': 0.536}
2024-05-01 16:35:19,976 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 16:35:19,976 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_5/evaluation_runs.json
2024-05-01 16:35:19,976 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:35:19,984 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 16:35:21,478 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:5	Num Templates:10	Num Examples with Template:606
2024-05-01 16:35:27,316 - root - [INFO] - 	!!!Scores: {'accuracy': 0.637, 'average': 0.637}
2024-05-01 16:35:27,316 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 16:35:27,317 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_6/evaluation_runs.json
2024-05-01 16:35:27,317 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:35:27,324 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 16:35:28,811 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:6	Num Templates:10	Num Examples with Template:606
2024-05-01 16:35:34,218 - root - [INFO] - 	!!!Scores: {'accuracy': 0.662, 'average': 0.662}
2024-05-01 16:35:34,218 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 16:35:34,219 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_7/evaluation_runs.json
2024-05-01 16:35:34,219 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:35:34,226 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 16:35:35,701 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:7	Num Templates:10	Num Examples with Template:606
2024-05-01 16:35:41,261 - root - [INFO] - 	!!!Scores: {'accuracy': 0.513, 'average': 0.513}
2024-05-01 16:35:41,261 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 16:35:41,261 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_8/evaluation_runs.json
2024-05-01 16:35:41,261 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:35:41,269 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 16:35:42,759 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:8	Num Templates:10	Num Examples with Template:606
2024-05-01 16:35:48,835 - root - [INFO] - 	!!!Scores: {'accuracy': 0.627, 'average': 0.627}
2024-05-01 16:35:48,835 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 16:35:48,835 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_9/evaluation_runs.json
2024-05-01 16:35:48,835 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:35:48,843 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 16:35:50,317 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:9	Num Templates:10	Num Examples with Template:606
2024-05-01 16:35:55,045 - root - [INFO] - 	!!!Scores: {'accuracy': 0.647, 'average': 0.647}
2024-05-01 16:35:55,045 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 16:35:55,045 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_0/evaluation_runs.json
2024-05-01 16:35:55,045 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:35:55,053 - root - [INFO] - 		Loading Full Data for wsc
2024-05-01 16:35:55,788 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-4bcfdb1f33f6e278/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 16:35:55,805 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 16:35:55,994 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:0	Num Templates:10	Num Examples with Template:72
2024-05-01 16:35:57,782 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 16:35:57,782 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 16:35:57,782 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_1/evaluation_runs.json
2024-05-01 16:35:57,782 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:35:57,790 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 16:35:57,961 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:1	Num Templates:10	Num Examples with Template:72
2024-05-01 16:35:59,720 - root - [INFO] - 	!!!Scores: {'accuracy': 0.556, 'average': 0.556}
2024-05-01 16:35:59,720 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 16:35:59,720 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_2/evaluation_runs.json
2024-05-01 16:35:59,720 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:35:59,727 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 16:35:59,942 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:2	Num Templates:10	Num Examples with Template:72
2024-05-01 16:36:01,866 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 16:36:01,866 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 16:36:01,866 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_3/evaluation_runs.json
2024-05-01 16:36:01,867 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:36:01,874 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 16:36:02,096 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:3	Num Templates:10	Num Examples with Template:72
2024-05-01 16:36:04,023 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 16:36:04,023 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 16:36:04,023 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_4/evaluation_runs.json
2024-05-01 16:36:04,023 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:36:04,030 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 16:36:04,211 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:4	Num Templates:10	Num Examples with Template:72
2024-05-01 16:36:05,960 - root - [INFO] - 	!!!Scores: {'accuracy': 0.556, 'average': 0.556}
2024-05-01 16:36:05,960 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 16:36:05,960 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_5/evaluation_runs.json
2024-05-01 16:36:05,960 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:36:05,967 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 16:36:06,141 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:5	Num Templates:10	Num Examples with Template:72
2024-05-01 16:36:07,940 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 16:36:07,940 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 16:36:07,940 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_6/evaluation_runs.json
2024-05-01 16:36:07,940 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:36:07,947 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 16:36:08,119 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:6	Num Templates:10	Num Examples with Template:72
2024-05-01 16:36:09,928 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 16:36:09,928 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 16:36:09,928 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_7/evaluation_runs.json
2024-05-01 16:36:09,928 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:36:09,936 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 16:36:10,207 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:7	Num Templates:10	Num Examples with Template:72
2024-05-01 16:36:11,975 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 16:36:11,975 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 16:36:11,975 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_8/evaluation_runs.json
2024-05-01 16:36:11,975 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:36:11,983 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 16:36:12,155 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:8	Num Templates:10	Num Examples with Template:72
2024-05-01 16:36:13,951 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 16:36:13,951 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 16:36:13,951 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_9/evaluation_runs.json
2024-05-01 16:36:13,952 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:36:13,959 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 16:36:14,246 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:9	Num Templates:10	Num Examples with Template:72
2024-05-01 16:36:16,006 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 16:36:16,006 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 16:36:16,006 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_0/evaluation_runs.json
2024-05-01 16:36:16,006 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:36:16,014 - root - [INFO] - 		Loading Full Data for copa
2024-05-01 16:36:16,742 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-75ee279101665e7b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 16:36:16,757 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 16:36:16,973 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:0	Num Templates:8	Num Examples with Template:68
2024-05-01 16:36:18,558 - root - [INFO] - 	!!!Scores: {'accuracy': 0.882, 'average': 0.882}
2024-05-01 16:36:18,558 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 16:36:18,559 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_1/evaluation_runs.json
2024-05-01 16:36:18,559 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:36:18,566 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 16:36:18,773 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:1	Num Templates:8	Num Examples with Template:68
2024-05-01 16:36:20,389 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 16:36:20,389 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 16:36:20,389 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_2/evaluation_runs.json
2024-05-01 16:36:20,389 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:36:20,396 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 16:36:20,604 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:2	Num Templates:8	Num Examples with Template:68
2024-05-01 16:36:22,198 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 16:36:22,198 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 16:36:22,198 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_3/evaluation_runs.json
2024-05-01 16:36:22,198 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:36:22,206 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 16:36:22,427 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:3	Num Templates:8	Num Examples with Template:68
2024-05-01 16:36:23,959 - root - [INFO] - 	!!!Scores: {'accuracy': 0.853, 'average': 0.853}
2024-05-01 16:36:23,959 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 16:36:23,959 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_4/evaluation_runs.json
2024-05-01 16:36:23,959 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:36:23,966 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 16:36:24,172 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:4	Num Templates:8	Num Examples with Template:68
2024-05-01 16:36:25,771 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 16:36:25,771 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 16:36:25,771 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_5/evaluation_runs.json
2024-05-01 16:36:25,771 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:36:25,778 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 16:36:25,987 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:5	Num Templates:8	Num Examples with Template:68
2024-05-01 16:36:27,592 - root - [INFO] - 	!!!Scores: {'accuracy': 0.853, 'average': 0.853}
2024-05-01 16:36:27,592 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 16:36:27,592 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_6/evaluation_runs.json
2024-05-01 16:36:27,592 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:36:27,599 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 16:36:27,806 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:6	Num Templates:8	Num Examples with Template:68
2024-05-01 16:36:29,368 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 16:36:29,368 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 16:36:29,368 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_7/evaluation_runs.json
2024-05-01 16:36:29,368 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:36:29,376 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 16:36:29,583 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:7	Num Templates:8	Num Examples with Template:68
2024-05-01 16:36:31,137 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 16:36:31,137 - root - [INFO] - 	Evaluating model on h-swag dataset
2024-05-01 16:36:31,137 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/h-swag_template_0/evaluation_runs.json
2024-05-01 16:36:31,137 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:36:31,150 - root - [INFO] - 		Loading Full Data for h-swag
2024-05-01 16:36:31,895 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-eac19eb96b26d7b6/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 16:36:32,582 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Selected Examples: 10010	Num Total Example:10010
2024-05-01 16:36:54,576 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Num Selected Example with Templates:10010	Template Idx:0	Num Templates:1	Num Examples with Template:10010
2024-05-01 16:42:07,664 - root - [INFO] - 	!!!Scores: {'accuracy': 0.425, 'average': 0.425}
2024-05-01 16:42:07,664 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 16:42:07,664 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_0/evaluation_runs.json
2024-05-01 16:42:07,664 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:42:08,413 - datasets.builder - [WARNING] - Found cached dataset arrow (/home/guodong/.cache/huggingface/datasets/arrow/default-2796e0ea53bfdf30/0.0.0/74f69db2c14c2860059d39860b1f400a03d11bf7fb5a8258ca38c501c878c137)
2024-05-01 16:42:08,528 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 16:42:14,511 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:0	Num Templates:5	Num Examples with Template:1839
2024-05-01 16:42:38,815 - root - [INFO] - 	!!!Scores: {'accuracy': 0.922, 'average': 0.922}
2024-05-01 16:42:38,815 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 16:42:38,815 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_1/evaluation_runs.json
2024-05-01 16:42:38,815 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:42:38,823 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 16:42:44,880 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:1	Num Templates:5	Num Examples with Template:1839
2024-05-01 16:43:09,613 - root - [INFO] - 	!!!Scores: {'accuracy': 0.922, 'average': 0.922}
2024-05-01 16:43:09,614 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 16:43:09,614 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_2/evaluation_runs.json
2024-05-01 16:43:09,614 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:43:09,622 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 16:43:15,618 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:2	Num Templates:5	Num Examples with Template:1839
2024-05-01 16:43:40,203 - root - [INFO] - 	!!!Scores: {'accuracy': 0.928, 'average': 0.928}
2024-05-01 16:43:40,203 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 16:43:40,203 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_3/evaluation_runs.json
2024-05-01 16:43:40,203 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:43:40,212 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 16:43:46,258 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:3	Num Templates:5	Num Examples with Template:1839
2024-05-01 16:44:10,878 - root - [INFO] - 	!!!Scores: {'accuracy': 0.915, 'average': 0.915}
2024-05-01 16:44:10,878 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 16:44:10,878 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_4/evaluation_runs.json
2024-05-01 16:44:10,879 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:44:10,887 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 16:44:16,903 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:4	Num Templates:5	Num Examples with Template:1839
2024-05-01 16:44:42,200 - root - [INFO] - 	!!!Scores: {'accuracy': 0.916, 'average': 0.916}
2024-05-01 16:44:42,200 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:44:42,200 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_0/evaluation_runs.json
2024-05-01 16:44:42,200 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:44:42,975 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-9d580cad42f0d988/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 16:44:43,029 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:44:44,829 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:45:03,301 - root - [INFO] - 	!!!Scores: {'accuracy': 0.637, 'average': 0.637}
2024-05-01 16:45:03,301 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:45:03,301 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_1/evaluation_runs.json
2024-05-01 16:45:03,301 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:45:03,309 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:45:05,148 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:45:21,690 - root - [INFO] - 	!!!Scores: {'accuracy': 0.667, 'average': 0.667}
2024-05-01 16:45:21,690 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:45:21,690 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_2/evaluation_runs.json
2024-05-01 16:45:21,690 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:45:21,698 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:45:23,517 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:45:39,776 - root - [INFO] - 	!!!Scores: {'accuracy': 0.675, 'average': 0.675}
2024-05-01 16:45:39,776 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:45:39,776 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_3/evaluation_runs.json
2024-05-01 16:45:39,776 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:45:39,784 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:45:41,587 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:45:58,023 - root - [INFO] - 	!!!Scores: {'accuracy': 0.669, 'average': 0.669}
2024-05-01 16:45:58,024 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:45:58,024 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_4/evaluation_runs.json
2024-05-01 16:45:58,024 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:45:58,033 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:45:59,832 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:46:16,623 - root - [INFO] - 	!!!Scores: {'accuracy': 0.675, 'average': 0.675}
2024-05-01 16:46:16,623 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:46:16,624 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_5/evaluation_runs.json
2024-05-01 16:46:16,624 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:46:16,631 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:46:18,452 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:46:34,917 - root - [INFO] - 	!!!Scores: {'accuracy': 0.67, 'average': 0.67}
2024-05-01 16:46:34,918 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:46:34,918 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_6/evaluation_runs.json
2024-05-01 16:46:34,918 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:46:34,926 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:46:37,307 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:46:55,431 - root - [INFO] - 	!!!Scores: {'accuracy': 0.665, 'average': 0.665}
2024-05-01 16:46:55,431 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:46:55,431 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_7/evaluation_runs.json
2024-05-01 16:46:55,431 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:46:55,439 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:46:57,276 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:47:13,970 - root - [INFO] - 	!!!Scores: {'accuracy': 0.672, 'average': 0.672}
2024-05-01 16:47:13,971 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:47:13,971 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_8/evaluation_runs.json
2024-05-01 16:47:13,971 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:47:13,979 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:47:15,815 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:47:32,332 - root - [INFO] - 	!!!Scores: {'accuracy': 0.673, 'average': 0.673}
2024-05-01 16:47:32,332 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:47:32,332 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_9/evaluation_runs.json
2024-05-01 16:47:32,332 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:47:32,340 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:47:34,710 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:47:51,860 - root - [INFO] - 	!!!Scores: {'accuracy': 0.634, 'average': 0.634}
2024-05-01 16:47:51,860 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:47:51,860 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_10/evaluation_runs.json
2024-05-01 16:47:51,860 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:47:51,868 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:47:54,234 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:48:11,075 - root - [INFO] - 	!!!Scores: {'accuracy': 0.654, 'average': 0.654}
2024-05-01 16:48:11,076 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:48:11,076 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_11/evaluation_runs.json
2024-05-01 16:48:11,076 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:48:11,084 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:48:12,882 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:48:29,404 - root - [INFO] - 	!!!Scores: {'accuracy': 0.663, 'average': 0.663}
2024-05-01 16:48:29,405 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:48:29,405 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_12/evaluation_runs.json
2024-05-01 16:48:29,405 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:48:29,413 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:48:31,763 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:48:48,626 - root - [INFO] - 	!!!Scores: {'accuracy': 0.635, 'average': 0.635}
2024-05-01 16:48:48,626 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:48:48,626 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_13/evaluation_runs.json
2024-05-01 16:48:48,626 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:48:48,634 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:48:50,977 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:49:08,700 - root - [INFO] - 	!!!Scores: {'accuracy': 0.661, 'average': 0.661}
2024-05-01 16:49:08,700 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 16:49:08,700 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_14/evaluation_runs.json
2024-05-01 16:49:08,700 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:49:08,708 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:49:10,562 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:49:27,084 - root - [INFO] - 	!!!Scores: {'accuracy': 0.667, 'average': 0.667}
2024-05-01 16:49:27,085 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:49:27,085 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_0/evaluation_runs.json
2024-05-01 16:49:27,085 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:49:28,047 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-2f5e309763d7971f/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 16:49:28,099 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:49:29,909 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:49:47,819 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 16:49:47,819 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:49:47,819 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_1/evaluation_runs.json
2024-05-01 16:49:47,819 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:49:47,827 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:49:49,661 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:50:05,897 - root - [INFO] - 	!!!Scores: {'accuracy': 0.515, 'average': 0.515}
2024-05-01 16:50:05,898 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:50:05,898 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_2/evaluation_runs.json
2024-05-01 16:50:05,898 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:50:05,906 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:50:07,721 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:50:23,576 - root - [INFO] - 	!!!Scores: {'accuracy': 0.523, 'average': 0.523}
2024-05-01 16:50:23,576 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:50:23,576 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_3/evaluation_runs.json
2024-05-01 16:50:23,577 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:50:23,585 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:50:25,387 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:50:41,427 - root - [INFO] - 	!!!Scores: {'accuracy': 0.498, 'average': 0.498}
2024-05-01 16:50:41,427 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:50:41,427 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_4/evaluation_runs.json
2024-05-01 16:50:41,427 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:50:41,436 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:50:43,229 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:50:59,683 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 16:50:59,683 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:50:59,683 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_5/evaluation_runs.json
2024-05-01 16:50:59,683 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:50:59,691 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:51:01,506 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:51:17,607 - root - [INFO] - 	!!!Scores: {'accuracy': 0.513, 'average': 0.513}
2024-05-01 16:51:17,607 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:51:17,607 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_6/evaluation_runs.json
2024-05-01 16:51:17,608 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:51:17,615 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:51:19,984 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:51:37,619 - root - [INFO] - 	!!!Scores: {'accuracy': 0.518, 'average': 0.518}
2024-05-01 16:51:37,619 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:51:37,619 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_7/evaluation_runs.json
2024-05-01 16:51:37,619 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:51:37,627 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:51:39,648 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:51:56,051 - root - [INFO] - 	!!!Scores: {'accuracy': 0.521, 'average': 0.521}
2024-05-01 16:51:56,051 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:51:56,051 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_8/evaluation_runs.json
2024-05-01 16:51:56,051 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:51:56,059 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:51:57,905 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:52:14,130 - root - [INFO] - 	!!!Scores: {'accuracy': 0.519, 'average': 0.519}
2024-05-01 16:52:14,130 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:52:14,130 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_9/evaluation_runs.json
2024-05-01 16:52:14,131 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:52:14,138 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:52:16,514 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:52:33,371 - root - [INFO] - 	!!!Scores: {'accuracy': 0.503, 'average': 0.503}
2024-05-01 16:52:33,371 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:52:33,371 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_10/evaluation_runs.json
2024-05-01 16:52:33,371 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:52:33,379 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:52:35,741 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:52:52,297 - root - [INFO] - 	!!!Scores: {'accuracy': 0.495, 'average': 0.495}
2024-05-01 16:52:52,297 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:52:52,297 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_11/evaluation_runs.json
2024-05-01 16:52:52,297 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:52:52,305 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:52:54,102 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:53:10,336 - root - [INFO] - 	!!!Scores: {'accuracy': 0.517, 'average': 0.517}
2024-05-01 16:53:10,337 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:53:10,337 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_12/evaluation_runs.json
2024-05-01 16:53:10,337 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:53:10,345 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:53:12,717 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:53:29,255 - root - [INFO] - 	!!!Scores: {'accuracy': 0.501, 'average': 0.501}
2024-05-01 16:53:29,256 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:53:29,256 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_13/evaluation_runs.json
2024-05-01 16:53:29,256 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:53:29,264 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:53:31,607 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:53:48,971 - root - [INFO] - 	!!!Scores: {'accuracy': 0.495, 'average': 0.495}
2024-05-01 16:53:48,972 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 16:53:48,972 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_14/evaluation_runs.json
2024-05-01 16:53:48,972 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:53:48,980 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 16:53:50,815 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 16:54:07,051 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 16:54:07,051 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:54:07,051 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_0/evaluation_runs.json
2024-05-01 16:54:07,052 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:54:08,027 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-5854c73e8e2a58bf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 16:54:08,086 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:54:10,245 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:0	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:54:35,524 - root - [INFO] - 	!!!Scores: {'accuracy': 0.504, 'average': 0.504}
2024-05-01 16:54:35,524 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:54:35,524 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_1/evaluation_runs.json
2024-05-01 16:54:35,524 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:54:35,533 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:54:37,737 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:1	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:55:00,583 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 16:55:00,583 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:55:00,583 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_2/evaluation_runs.json
2024-05-01 16:55:00,583 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:55:00,591 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:55:02,773 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:2	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:55:25,231 - root - [INFO] - 	!!!Scores: {'accuracy': 0.484, 'average': 0.484}
2024-05-01 16:55:25,231 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:55:25,231 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_3/evaluation_runs.json
2024-05-01 16:55:25,232 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:55:25,239 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:55:27,395 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:3	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:55:50,073 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 16:55:50,074 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:55:50,074 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_4/evaluation_runs.json
2024-05-01 16:55:50,074 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:55:50,083 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:55:52,234 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:4	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:56:15,461 - root - [INFO] - 	!!!Scores: {'accuracy': 0.496, 'average': 0.496}
2024-05-01 16:56:15,461 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:56:15,461 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_5/evaluation_runs.json
2024-05-01 16:56:15,461 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:56:15,469 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:56:17,641 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:5	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:56:40,464 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 16:56:40,464 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:56:40,464 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_6/evaluation_runs.json
2024-05-01 16:56:40,464 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:56:40,472 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:56:43,324 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:6	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:57:08,203 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 16:57:08,203 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:57:08,203 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_7/evaluation_runs.json
2024-05-01 16:57:08,203 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:57:08,211 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:57:10,416 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:7	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:57:33,571 - root - [INFO] - 	!!!Scores: {'accuracy': 0.482, 'average': 0.482}
2024-05-01 16:57:33,571 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:57:33,572 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_8/evaluation_runs.json
2024-05-01 16:57:33,572 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:57:33,581 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:57:35,785 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:8	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:57:58,630 - root - [INFO] - 	!!!Scores: {'accuracy': 0.485, 'average': 0.485}
2024-05-01 16:57:58,630 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:57:58,630 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_9/evaluation_runs.json
2024-05-01 16:57:58,630 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:57:58,638 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:58:01,486 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:9	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:58:25,278 - root - [INFO] - 	!!!Scores: {'accuracy': 0.495, 'average': 0.495}
2024-05-01 16:58:25,278 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:58:25,278 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_10/evaluation_runs.json
2024-05-01 16:58:25,278 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:58:25,286 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:58:28,130 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:10	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:58:51,653 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 16:58:51,653 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:58:51,653 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_11/evaluation_runs.json
2024-05-01 16:58:51,653 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:58:51,661 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:58:53,824 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:11	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:59:16,838 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 16:59:16,838 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:59:16,838 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_12/evaluation_runs.json
2024-05-01 16:59:16,838 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:59:16,846 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:59:19,682 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:12	Num Templates:15	Num Examples with Template:1200
2024-05-01 16:59:43,219 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 16:59:43,220 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 16:59:43,220 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_13/evaluation_runs.json
2024-05-01 16:59:43,220 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 16:59:43,229 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 16:59:46,066 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:13	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:00:10,644 - root - [INFO] - 	!!!Scores: {'accuracy': 0.507, 'average': 0.507}
2024-05-01 17:00:10,644 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:00:10,644 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_14/evaluation_runs.json
2024-05-01 17:00:10,645 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:00:10,653 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:00:12,889 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:14	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:00:35,950 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 17:00:36,026 - root - [INFO] - Unexpected keys: []
2024-05-01 17:00:36,265 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:00:36,265 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_0/evaluation_runs.json
2024-05-01 17:00:36,265 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:00:36,274 - root - [INFO] - 		Loading Full Data for rte
2024-05-01 17:00:37,247 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-667c257dd17a5fbb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 17:00:37,270 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:00:37,791 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:0	Num Templates:10	Num Examples with Template:245
2024-05-01 17:00:43,583 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 17:00:43,583 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:00:43,583 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_1/evaluation_runs.json
2024-05-01 17:00:43,583 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:00:43,591 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:00:44,114 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:1	Num Templates:10	Num Examples with Template:245
2024-05-01 17:00:49,591 - root - [INFO] - 	!!!Scores: {'accuracy': 0.796, 'average': 0.796}
2024-05-01 17:00:49,592 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:00:49,592 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_2/evaluation_runs.json
2024-05-01 17:00:49,592 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:00:49,600 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:00:50,121 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:2	Num Templates:10	Num Examples with Template:245
2024-05-01 17:00:55,604 - root - [INFO] - 	!!!Scores: {'accuracy': 0.812, 'average': 0.812}
2024-05-01 17:00:55,604 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:00:55,604 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_3/evaluation_runs.json
2024-05-01 17:00:55,605 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:00:55,613 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:00:56,132 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:3	Num Templates:10	Num Examples with Template:245
2024-05-01 17:01:01,537 - root - [INFO] - 	!!!Scores: {'accuracy': 0.78, 'average': 0.78}
2024-05-01 17:01:01,537 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:01:01,537 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_4/evaluation_runs.json
2024-05-01 17:01:01,537 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:01:01,545 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:01:02,069 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:4	Num Templates:10	Num Examples with Template:245
2024-05-01 17:01:07,548 - root - [INFO] - 	!!!Scores: {'accuracy': 0.816, 'average': 0.816}
2024-05-01 17:01:07,548 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:01:07,548 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_5/evaluation_runs.json
2024-05-01 17:01:07,548 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:01:07,556 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:01:08,086 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:5	Num Templates:10	Num Examples with Template:245
2024-05-01 17:01:13,565 - root - [INFO] - 	!!!Scores: {'accuracy': 0.796, 'average': 0.796}
2024-05-01 17:01:13,566 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:01:13,566 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_6/evaluation_runs.json
2024-05-01 17:01:13,566 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:01:13,574 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:01:14,097 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:6	Num Templates:10	Num Examples with Template:245
2024-05-01 17:01:19,514 - root - [INFO] - 	!!!Scores: {'accuracy': 0.78, 'average': 0.78}
2024-05-01 17:01:19,514 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:01:19,514 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_7/evaluation_runs.json
2024-05-01 17:01:19,514 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:01:19,522 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:01:20,040 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:7	Num Templates:10	Num Examples with Template:245
2024-05-01 17:01:25,654 - root - [INFO] - 	!!!Scores: {'accuracy': 0.776, 'average': 0.776}
2024-05-01 17:01:25,654 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:01:25,654 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_8/evaluation_runs.json
2024-05-01 17:01:25,654 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:01:25,662 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:01:26,176 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:8	Num Templates:10	Num Examples with Template:245
2024-05-01 17:01:31,652 - root - [INFO] - 	!!!Scores: {'accuracy': 0.788, 'average': 0.788}
2024-05-01 17:01:31,652 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:01:31,652 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_9/evaluation_runs.json
2024-05-01 17:01:31,652 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:01:31,661 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:01:32,185 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:9	Num Templates:10	Num Examples with Template:245
2024-05-01 17:01:37,762 - root - [INFO] - 	!!!Scores: {'accuracy': 0.804, 'average': 0.804}
2024-05-01 17:01:37,762 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:01:37,762 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_0/evaluation_runs.json
2024-05-01 17:01:37,762 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:01:37,770 - root - [INFO] - 		Loading Full Data for cb
2024-05-01 17:01:38,515 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-c26e73467e3f215e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 17:01:38,524 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:01:38,589 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:0	Num Templates:15	Num Examples with Template:24
2024-05-01 17:01:40,037 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 17:01:40,038 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:01:40,038 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_1/evaluation_runs.json
2024-05-01 17:01:40,038 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:01:40,046 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:01:40,097 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:1	Num Templates:15	Num Examples with Template:24
2024-05-01 17:01:41,551 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 17:01:41,551 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:01:41,551 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_2/evaluation_runs.json
2024-05-01 17:01:41,551 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:01:41,559 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:01:41,624 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:2	Num Templates:15	Num Examples with Template:24
2024-05-01 17:01:43,096 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 17:01:43,096 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:01:43,096 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_3/evaluation_runs.json
2024-05-01 17:01:43,096 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:01:43,104 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:01:43,156 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:3	Num Templates:15	Num Examples with Template:24
2024-05-01 17:01:44,597 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 17:01:44,597 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:01:44,597 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_4/evaluation_runs.json
2024-05-01 17:01:44,597 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:01:44,605 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:01:44,656 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:4	Num Templates:15	Num Examples with Template:24
2024-05-01 17:01:46,102 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 17:01:46,102 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:01:46,102 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_5/evaluation_runs.json
2024-05-01 17:01:46,102 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:01:46,110 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:01:46,175 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:5	Num Templates:15	Num Examples with Template:24
2024-05-01 17:01:47,629 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 17:01:47,629 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:01:47,629 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_6/evaluation_runs.json
2024-05-01 17:01:47,630 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:01:47,638 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:01:47,689 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:6	Num Templates:15	Num Examples with Template:24
2024-05-01 17:01:49,132 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 17:01:49,132 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:01:49,132 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_7/evaluation_runs.json
2024-05-01 17:01:49,132 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:01:49,140 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:01:49,205 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:7	Num Templates:15	Num Examples with Template:24
2024-05-01 17:01:50,661 - root - [INFO] - 	!!!Scores: {'accuracy': 0.75, 'average': 0.75}
2024-05-01 17:01:50,661 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:01:50,661 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_8/evaluation_runs.json
2024-05-01 17:01:50,661 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:01:50,669 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:01:50,721 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:8	Num Templates:15	Num Examples with Template:24
2024-05-01 17:01:52,171 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 17:01:52,171 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:01:52,171 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_9/evaluation_runs.json
2024-05-01 17:01:52,171 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:01:52,179 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:01:52,231 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:9	Num Templates:15	Num Examples with Template:24
2024-05-01 17:01:53,683 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 17:01:53,683 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:01:53,683 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_10/evaluation_runs.json
2024-05-01 17:01:53,683 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:01:53,691 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:01:53,757 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:10	Num Templates:15	Num Examples with Template:24
2024-05-01 17:01:55,216 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 17:01:55,216 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:01:55,216 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_11/evaluation_runs.json
2024-05-01 17:01:55,217 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:01:55,225 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:01:55,276 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:11	Num Templates:15	Num Examples with Template:24
2024-05-01 17:01:56,725 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 17:01:56,725 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:01:56,725 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_12/evaluation_runs.json
2024-05-01 17:01:56,726 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:01:56,734 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:01:56,786 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:12	Num Templates:15	Num Examples with Template:24
2024-05-01 17:01:58,278 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 17:01:58,278 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:01:58,278 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_13/evaluation_runs.json
2024-05-01 17:01:58,279 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:01:58,287 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:01:58,338 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:13	Num Templates:15	Num Examples with Template:24
2024-05-01 17:01:59,786 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 17:01:59,787 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:01:59,787 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_14/evaluation_runs.json
2024-05-01 17:01:59,787 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:01:59,795 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:01:59,860 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:14	Num Templates:15	Num Examples with Template:24
2024-05-01 17:02:01,344 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 17:02:01,344 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 17:02:01,344 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_0/evaluation_runs.json
2024-05-01 17:02:01,344 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:02:02,061 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-3ae7a9f52719d86a/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 17:02:02,116 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 17:02:05,726 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:0	Num Templates:5	Num Examples with Template:1235
2024-05-01 17:02:15,490 - root - [INFO] - 	!!!Scores: {'accuracy': 0.676, 'average': 0.676}
2024-05-01 17:02:15,490 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 17:02:15,490 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_1/evaluation_runs.json
2024-05-01 17:02:15,491 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:02:15,499 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 17:02:19,055 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:1	Num Templates:5	Num Examples with Template:1235
2024-05-01 17:02:28,917 - root - [INFO] - 	!!!Scores: {'accuracy': 0.67, 'average': 0.67}
2024-05-01 17:02:28,917 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 17:02:28,917 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_2/evaluation_runs.json
2024-05-01 17:02:28,917 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:02:28,926 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 17:02:32,538 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:2	Num Templates:5	Num Examples with Template:1235
2024-05-01 17:02:42,448 - root - [INFO] - 	!!!Scores: {'accuracy': 0.672, 'average': 0.672}
2024-05-01 17:02:42,448 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 17:02:42,448 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_3/evaluation_runs.json
2024-05-01 17:02:42,448 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:02:42,457 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 17:02:46,099 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:3	Num Templates:5	Num Examples with Template:1235
2024-05-01 17:02:56,333 - root - [INFO] - 	!!!Scores: {'accuracy': 0.664, 'average': 0.664}
2024-05-01 17:02:56,333 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 17:02:56,333 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_4/evaluation_runs.json
2024-05-01 17:02:56,333 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:02:56,341 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 17:02:59,974 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:4	Num Templates:5	Num Examples with Template:1235
2024-05-01 17:03:10,086 - root - [INFO] - 	!!!Scores: {'accuracy': 0.668, 'average': 0.668}
2024-05-01 17:03:10,086 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 17:03:10,086 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_0/evaluation_runs.json
2024-05-01 17:03:10,086 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:03:10,094 - root - [INFO] - 		Loading Full Data for wic
2024-05-01 17:03:10,817 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-8f888d9273347dcb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 17:03:10,869 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 17:03:12,368 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:0	Num Templates:10	Num Examples with Template:606
2024-05-01 17:03:17,752 - root - [INFO] - 	!!!Scores: {'accuracy': 0.649, 'average': 0.649}
2024-05-01 17:03:17,752 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 17:03:17,752 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_1/evaluation_runs.json
2024-05-01 17:03:17,752 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:03:17,761 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 17:03:19,261 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:1	Num Templates:10	Num Examples with Template:606
2024-05-01 17:03:24,495 - root - [INFO] - 	!!!Scores: {'accuracy': 0.65, 'average': 0.65}
2024-05-01 17:03:24,496 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 17:03:24,496 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_2/evaluation_runs.json
2024-05-01 17:03:24,496 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:03:24,504 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 17:03:26,013 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:2	Num Templates:10	Num Examples with Template:606
2024-05-01 17:03:31,779 - root - [INFO] - 	!!!Scores: {'accuracy': 0.644, 'average': 0.644}
2024-05-01 17:03:31,780 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 17:03:31,780 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_3/evaluation_runs.json
2024-05-01 17:03:31,780 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:03:31,788 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 17:03:33,293 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:3	Num Templates:10	Num Examples with Template:606
2024-05-01 17:03:39,117 - root - [INFO] - 	!!!Scores: {'accuracy': 0.533, 'average': 0.533}
2024-05-01 17:03:39,117 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 17:03:39,117 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_4/evaluation_runs.json
2024-05-01 17:03:39,118 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:03:39,125 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 17:03:40,606 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:4	Num Templates:10	Num Examples with Template:606
2024-05-01 17:03:46,001 - root - [INFO] - 	!!!Scores: {'accuracy': 0.538, 'average': 0.538}
2024-05-01 17:03:46,001 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 17:03:46,001 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_5/evaluation_runs.json
2024-05-01 17:03:46,001 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:03:46,009 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 17:03:47,504 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:5	Num Templates:10	Num Examples with Template:606
2024-05-01 17:03:53,319 - root - [INFO] - 	!!!Scores: {'accuracy': 0.645, 'average': 0.645}
2024-05-01 17:03:53,319 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 17:03:53,319 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_6/evaluation_runs.json
2024-05-01 17:03:53,319 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:03:53,327 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 17:03:54,816 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:6	Num Templates:10	Num Examples with Template:606
2024-05-01 17:04:00,282 - root - [INFO] - 	!!!Scores: {'accuracy': 0.66, 'average': 0.66}
2024-05-01 17:04:00,282 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 17:04:00,283 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_7/evaluation_runs.json
2024-05-01 17:04:00,283 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:04:00,290 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 17:04:01,768 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:7	Num Templates:10	Num Examples with Template:606
2024-05-01 17:04:07,321 - root - [INFO] - 	!!!Scores: {'accuracy': 0.515, 'average': 0.515}
2024-05-01 17:04:07,321 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 17:04:07,321 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_8/evaluation_runs.json
2024-05-01 17:04:07,321 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:04:07,329 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 17:04:08,845 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:8	Num Templates:10	Num Examples with Template:606
2024-05-01 17:04:14,933 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 17:04:14,933 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 17:04:14,933 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_9/evaluation_runs.json
2024-05-01 17:04:14,933 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:04:14,941 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 17:04:16,425 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:9	Num Templates:10	Num Examples with Template:606
2024-05-01 17:04:21,202 - root - [INFO] - 	!!!Scores: {'accuracy': 0.647, 'average': 0.647}
2024-05-01 17:04:21,202 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 17:04:21,202 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_0/evaluation_runs.json
2024-05-01 17:04:21,202 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:04:21,210 - root - [INFO] - 		Loading Full Data for wsc
2024-05-01 17:04:22,202 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-4bcfdb1f33f6e278/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 17:04:22,219 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 17:04:22,407 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:0	Num Templates:10	Num Examples with Template:72
2024-05-01 17:04:24,210 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 17:04:24,210 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 17:04:24,210 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_1/evaluation_runs.json
2024-05-01 17:04:24,211 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:04:24,218 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 17:04:24,392 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:1	Num Templates:10	Num Examples with Template:72
2024-05-01 17:04:26,160 - root - [INFO] - 	!!!Scores: {'accuracy': 0.556, 'average': 0.556}
2024-05-01 17:04:26,160 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 17:04:26,160 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_2/evaluation_runs.json
2024-05-01 17:04:26,160 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:04:26,168 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 17:04:26,386 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:2	Num Templates:10	Num Examples with Template:72
2024-05-01 17:04:28,316 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 17:04:28,316 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 17:04:28,316 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_3/evaluation_runs.json
2024-05-01 17:04:28,316 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:04:28,324 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 17:04:28,540 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:3	Num Templates:10	Num Examples with Template:72
2024-05-01 17:04:30,478 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 17:04:30,478 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 17:04:30,478 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_4/evaluation_runs.json
2024-05-01 17:04:30,479 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:04:30,487 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 17:04:30,673 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:4	Num Templates:10	Num Examples with Template:72
2024-05-01 17:04:32,433 - root - [INFO] - 	!!!Scores: {'accuracy': 0.556, 'average': 0.556}
2024-05-01 17:04:32,433 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 17:04:32,433 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_5/evaluation_runs.json
2024-05-01 17:04:32,433 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:04:32,442 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 17:04:32,618 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:5	Num Templates:10	Num Examples with Template:72
2024-05-01 17:04:34,426 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 17:04:34,426 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 17:04:34,426 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_6/evaluation_runs.json
2024-05-01 17:04:34,427 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:04:34,435 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 17:04:34,609 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:6	Num Templates:10	Num Examples with Template:72
2024-05-01 17:04:36,427 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 17:04:36,427 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 17:04:36,427 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_7/evaluation_runs.json
2024-05-01 17:04:36,427 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:04:36,435 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 17:04:36,712 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:7	Num Templates:10	Num Examples with Template:72
2024-05-01 17:04:38,491 - root - [INFO] - 	!!!Scores: {'accuracy': 0.514, 'average': 0.514}
2024-05-01 17:04:38,491 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 17:04:38,491 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_8/evaluation_runs.json
2024-05-01 17:04:38,491 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:04:38,499 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 17:04:38,675 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:8	Num Templates:10	Num Examples with Template:72
2024-05-01 17:04:40,483 - root - [INFO] - 	!!!Scores: {'accuracy': 0.611, 'average': 0.611}
2024-05-01 17:04:40,483 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 17:04:40,483 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_9/evaluation_runs.json
2024-05-01 17:04:40,483 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:04:40,491 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 17:04:40,783 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:9	Num Templates:10	Num Examples with Template:72
2024-05-01 17:04:42,554 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 17:04:42,555 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 17:04:42,555 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_0/evaluation_runs.json
2024-05-01 17:04:42,555 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:04:42,563 - root - [INFO] - 		Loading Full Data for copa
2024-05-01 17:04:43,530 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-75ee279101665e7b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 17:04:43,545 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 17:04:43,765 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:0	Num Templates:8	Num Examples with Template:68
2024-05-01 17:04:45,365 - root - [INFO] - 	!!!Scores: {'accuracy': 0.882, 'average': 0.882}
2024-05-01 17:04:45,365 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 17:04:45,365 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_1/evaluation_runs.json
2024-05-01 17:04:45,365 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:04:45,373 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 17:04:45,584 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:1	Num Templates:8	Num Examples with Template:68
2024-05-01 17:04:47,213 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 17:04:47,213 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 17:04:47,213 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_2/evaluation_runs.json
2024-05-01 17:04:47,213 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:04:47,221 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 17:04:47,455 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:2	Num Templates:8	Num Examples with Template:68
2024-05-01 17:04:49,064 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 17:04:49,064 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 17:04:49,064 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_3/evaluation_runs.json
2024-05-01 17:04:49,064 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:04:49,073 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 17:04:49,299 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:3	Num Templates:8	Num Examples with Template:68
2024-05-01 17:04:50,856 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 17:04:50,857 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 17:04:50,857 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_4/evaluation_runs.json
2024-05-01 17:04:50,857 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:04:50,869 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 17:04:51,110 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:4	Num Templates:8	Num Examples with Template:68
2024-05-01 17:04:52,726 - root - [INFO] - 	!!!Scores: {'accuracy': 0.882, 'average': 0.882}
2024-05-01 17:04:52,726 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 17:04:52,726 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_5/evaluation_runs.json
2024-05-01 17:04:52,726 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:04:52,735 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 17:04:52,947 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:5	Num Templates:8	Num Examples with Template:68
2024-05-01 17:04:54,567 - root - [INFO] - 	!!!Scores: {'accuracy': 0.853, 'average': 0.853}
2024-05-01 17:04:54,567 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 17:04:54,567 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_6/evaluation_runs.json
2024-05-01 17:04:54,567 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:04:54,576 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 17:04:54,787 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:6	Num Templates:8	Num Examples with Template:68
2024-05-01 17:04:56,364 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 17:04:56,364 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 17:04:56,364 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_7/evaluation_runs.json
2024-05-01 17:04:56,364 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:04:56,373 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 17:04:56,584 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:7	Num Templates:8	Num Examples with Template:68
2024-05-01 17:04:58,157 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 17:04:58,157 - root - [INFO] - 	Evaluating model on h-swag dataset
2024-05-01 17:04:58,157 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/h-swag_template_0/evaluation_runs.json
2024-05-01 17:04:58,157 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:04:58,172 - root - [INFO] - 		Loading Full Data for h-swag
2024-05-01 17:04:59,148 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-eac19eb96b26d7b6/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 17:05:00,006 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Selected Examples: 10010	Num Total Example:10010
2024-05-01 17:05:22,416 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Num Selected Example with Templates:10010	Template Idx:0	Num Templates:1	Num Examples with Template:10010
2024-05-01 17:10:37,411 - root - [INFO] - 	!!!Scores: {'accuracy': 0.426, 'average': 0.426}
2024-05-01 17:10:37,412 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 17:10:37,412 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_0/evaluation_runs.json
2024-05-01 17:10:37,412 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:10:38,388 - datasets.builder - [WARNING] - Found cached dataset arrow (/home/guodong/.cache/huggingface/datasets/arrow/default-2796e0ea53bfdf30/0.0.0/74f69db2c14c2860059d39860b1f400a03d11bf7fb5a8258ca38c501c878c137)
2024-05-01 17:10:38,500 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 17:10:44,578 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:0	Num Templates:5	Num Examples with Template:1839
2024-05-01 17:11:09,199 - root - [INFO] - 	!!!Scores: {'accuracy': 0.922, 'average': 0.922}
2024-05-01 17:11:09,200 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 17:11:09,200 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_1/evaluation_runs.json
2024-05-01 17:11:09,200 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:11:09,210 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 17:11:15,299 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:1	Num Templates:5	Num Examples with Template:1839
2024-05-01 17:11:40,330 - root - [INFO] - 	!!!Scores: {'accuracy': 0.921, 'average': 0.921}
2024-05-01 17:11:40,330 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 17:11:40,330 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_2/evaluation_runs.json
2024-05-01 17:11:40,330 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:11:40,340 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 17:11:46,371 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:2	Num Templates:5	Num Examples with Template:1839
2024-05-01 17:12:11,139 - root - [INFO] - 	!!!Scores: {'accuracy': 0.929, 'average': 0.929}
2024-05-01 17:12:11,139 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 17:12:11,139 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_3/evaluation_runs.json
2024-05-01 17:12:11,139 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:12:11,149 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 17:12:17,216 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:3	Num Templates:5	Num Examples with Template:1839
2024-05-01 17:12:42,070 - root - [INFO] - 	!!!Scores: {'accuracy': 0.916, 'average': 0.916}
2024-05-01 17:12:42,070 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 17:12:42,070 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_4/evaluation_runs.json
2024-05-01 17:12:42,070 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:12:42,080 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 17:12:48,091 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:4	Num Templates:5	Num Examples with Template:1839
2024-05-01 17:13:13,367 - root - [INFO] - 	!!!Scores: {'accuracy': 0.918, 'average': 0.918}
2024-05-01 17:13:13,367 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:13:13,367 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_0/evaluation_runs.json
2024-05-01 17:13:13,367 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:13:14,341 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-9d580cad42f0d988/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 17:13:14,392 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:13:16,207 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:13:34,691 - root - [INFO] - 	!!!Scores: {'accuracy': 0.642, 'average': 0.642}
2024-05-01 17:13:34,692 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:13:34,692 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_1/evaluation_runs.json
2024-05-01 17:13:34,692 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:13:34,700 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:13:36,545 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:13:53,077 - root - [INFO] - 	!!!Scores: {'accuracy': 0.675, 'average': 0.675}
2024-05-01 17:13:53,077 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:13:53,077 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_2/evaluation_runs.json
2024-05-01 17:13:53,077 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:13:53,085 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:13:54,911 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:14:11,168 - root - [INFO] - 	!!!Scores: {'accuracy': 0.685, 'average': 0.685}
2024-05-01 17:14:11,169 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:14:11,169 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_3/evaluation_runs.json
2024-05-01 17:14:11,169 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:14:11,178 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:14:12,986 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:14:29,411 - root - [INFO] - 	!!!Scores: {'accuracy': 0.667, 'average': 0.667}
2024-05-01 17:14:29,411 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:14:29,411 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_4/evaluation_runs.json
2024-05-01 17:14:29,411 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:14:29,419 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:14:31,401 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:14:48,175 - root - [INFO] - 	!!!Scores: {'accuracy': 0.676, 'average': 0.676}
2024-05-01 17:14:48,175 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:14:48,175 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_5/evaluation_runs.json
2024-05-01 17:14:48,175 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:14:48,183 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:14:50,012 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:15:06,474 - root - [INFO] - 	!!!Scores: {'accuracy': 0.675, 'average': 0.675}
2024-05-01 17:15:06,474 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:15:06,474 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_6/evaluation_runs.json
2024-05-01 17:15:06,474 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:15:06,483 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:15:08,865 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:15:26,985 - root - [INFO] - 	!!!Scores: {'accuracy': 0.666, 'average': 0.666}
2024-05-01 17:15:26,985 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:15:26,986 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_7/evaluation_runs.json
2024-05-01 17:15:26,986 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:15:26,993 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:15:28,841 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:15:45,535 - root - [INFO] - 	!!!Scores: {'accuracy': 0.676, 'average': 0.676}
2024-05-01 17:15:45,535 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:15:45,535 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_8/evaluation_runs.json
2024-05-01 17:15:45,535 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:15:45,543 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:15:47,387 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:16:03,905 - root - [INFO] - 	!!!Scores: {'accuracy': 0.677, 'average': 0.677}
2024-05-01 17:16:03,905 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:16:03,905 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_9/evaluation_runs.json
2024-05-01 17:16:03,905 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:16:03,913 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:16:06,292 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:16:23,451 - root - [INFO] - 	!!!Scores: {'accuracy': 0.635, 'average': 0.635}
2024-05-01 17:16:23,451 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:16:23,451 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_10/evaluation_runs.json
2024-05-01 17:16:23,452 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:16:23,461 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:16:25,836 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:16:42,678 - root - [INFO] - 	!!!Scores: {'accuracy': 0.662, 'average': 0.662}
2024-05-01 17:16:42,678 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:16:42,678 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_11/evaluation_runs.json
2024-05-01 17:16:42,678 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:16:42,686 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:16:44,491 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:17:01,010 - root - [INFO] - 	!!!Scores: {'accuracy': 0.672, 'average': 0.672}
2024-05-01 17:17:01,010 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:17:01,010 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_12/evaluation_runs.json
2024-05-01 17:17:01,010 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:17:01,018 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:17:03,377 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:17:20,240 - root - [INFO] - 	!!!Scores: {'accuracy': 0.641, 'average': 0.641}
2024-05-01 17:17:20,240 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:17:20,240 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_13/evaluation_runs.json
2024-05-01 17:17:20,240 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:17:20,248 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:17:22,599 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:17:40,332 - root - [INFO] - 	!!!Scores: {'accuracy': 0.663, 'average': 0.663}
2024-05-01 17:17:40,332 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:17:40,332 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_14/evaluation_runs.json
2024-05-01 17:17:40,332 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:17:40,340 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:17:42,183 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:17:58,704 - root - [INFO] - 	!!!Scores: {'accuracy': 0.673, 'average': 0.673}
2024-05-01 17:17:58,704 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:17:58,704 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_0/evaluation_runs.json
2024-05-01 17:17:58,704 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:17:59,676 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-2f5e309763d7971f/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 17:17:59,730 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:18:01,533 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:18:19,440 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 17:18:19,440 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:18:19,440 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_1/evaluation_runs.json
2024-05-01 17:18:19,440 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:18:19,448 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:18:21,288 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:18:37,524 - root - [INFO] - 	!!!Scores: {'accuracy': 0.514, 'average': 0.514}
2024-05-01 17:18:37,524 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:18:37,524 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_2/evaluation_runs.json
2024-05-01 17:18:37,524 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:18:37,532 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:18:39,348 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:18:55,191 - root - [INFO] - 	!!!Scores: {'accuracy': 0.519, 'average': 0.519}
2024-05-01 17:18:55,191 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:18:55,191 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_3/evaluation_runs.json
2024-05-01 17:18:55,192 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:18:55,199 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:18:57,001 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:19:13,038 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 17:19:13,038 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:19:13,038 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_4/evaluation_runs.json
2024-05-01 17:19:13,038 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:19:13,046 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:19:14,845 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:19:31,305 - root - [INFO] - 	!!!Scores: {'accuracy': 0.513, 'average': 0.513}
2024-05-01 17:19:31,305 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:19:31,305 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_5/evaluation_runs.json
2024-05-01 17:19:31,305 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:19:31,313 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:19:33,132 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:19:49,237 - root - [INFO] - 	!!!Scores: {'accuracy': 0.517, 'average': 0.517}
2024-05-01 17:19:49,237 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:19:49,237 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_6/evaluation_runs.json
2024-05-01 17:19:49,237 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:19:49,245 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:19:51,621 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:20:09,267 - root - [INFO] - 	!!!Scores: {'accuracy': 0.517, 'average': 0.517}
2024-05-01 17:20:09,267 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:20:09,267 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_7/evaluation_runs.json
2024-05-01 17:20:09,267 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:20:09,275 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:20:11,116 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:20:27,501 - root - [INFO] - 	!!!Scores: {'accuracy': 0.52, 'average': 0.52}
2024-05-01 17:20:27,501 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:20:27,501 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_8/evaluation_runs.json
2024-05-01 17:20:27,501 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:20:27,509 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:20:29,350 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:20:45,581 - root - [INFO] - 	!!!Scores: {'accuracy': 0.52, 'average': 0.52}
2024-05-01 17:20:45,581 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:20:45,581 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_9/evaluation_runs.json
2024-05-01 17:20:45,581 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:20:45,589 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:20:47,967 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:21:04,812 - root - [INFO] - 	!!!Scores: {'accuracy': 0.504, 'average': 0.504}
2024-05-01 17:21:04,812 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:21:04,812 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_10/evaluation_runs.json
2024-05-01 17:21:04,812 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:21:04,820 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:21:07,187 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:21:23,748 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 17:21:23,749 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:21:23,749 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_11/evaluation_runs.json
2024-05-01 17:21:23,749 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:21:23,756 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:21:25,559 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:21:41,805 - root - [INFO] - 	!!!Scores: {'accuracy': 0.524, 'average': 0.524}
2024-05-01 17:21:41,805 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:21:41,805 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_12/evaluation_runs.json
2024-05-01 17:21:41,805 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:21:41,813 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:21:44,170 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:22:00,714 - root - [INFO] - 	!!!Scores: {'accuracy': 0.505, 'average': 0.505}
2024-05-01 17:22:00,714 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:22:00,714 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_13/evaluation_runs.json
2024-05-01 17:22:00,714 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:22:00,723 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:22:03,071 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:22:20,444 - root - [INFO] - 	!!!Scores: {'accuracy': 0.501, 'average': 0.501}
2024-05-01 17:22:20,444 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:22:20,444 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_14/evaluation_runs.json
2024-05-01 17:22:20,444 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:22:20,452 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:22:22,290 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:22:38,534 - root - [INFO] - 	!!!Scores: {'accuracy': 0.52, 'average': 0.52}
2024-05-01 17:22:38,534 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:22:38,534 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_0/evaluation_runs.json
2024-05-01 17:22:38,534 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:22:39,241 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-5854c73e8e2a58bf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 17:22:39,302 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:22:41,468 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:0	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:23:06,752 - root - [INFO] - 	!!!Scores: {'accuracy': 0.503, 'average': 0.503}
2024-05-01 17:23:06,752 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:23:06,753 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_1/evaluation_runs.json
2024-05-01 17:23:06,753 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:23:06,761 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:23:08,970 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:1	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:23:31,820 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 17:23:31,820 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:23:31,820 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_2/evaluation_runs.json
2024-05-01 17:23:31,820 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:23:31,828 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:23:34,008 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:2	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:23:56,484 - root - [INFO] - 	!!!Scores: {'accuracy': 0.484, 'average': 0.484}
2024-05-01 17:23:56,485 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:23:56,485 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_3/evaluation_runs.json
2024-05-01 17:23:56,485 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:23:56,493 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:23:58,659 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:3	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:24:21,340 - root - [INFO] - 	!!!Scores: {'accuracy': 0.508, 'average': 0.508}
2024-05-01 17:24:21,340 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:24:21,340 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_4/evaluation_runs.json
2024-05-01 17:24:21,340 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:24:21,349 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:24:23,513 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:4	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:24:46,750 - root - [INFO] - 	!!!Scores: {'accuracy': 0.501, 'average': 0.501}
2024-05-01 17:24:46,750 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:24:46,750 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_5/evaluation_runs.json
2024-05-01 17:24:46,750 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:24:46,758 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:24:48,944 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:5	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:25:11,693 - root - [INFO] - 	!!!Scores: {'accuracy': 0.506, 'average': 0.506}
2024-05-01 17:25:11,693 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:25:11,693 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_6/evaluation_runs.json
2024-05-01 17:25:11,694 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:25:11,701 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:25:14,551 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:6	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:25:39,406 - root - [INFO] - 	!!!Scores: {'accuracy': 0.49, 'average': 0.49}
2024-05-01 17:25:39,406 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:25:39,407 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_7/evaluation_runs.json
2024-05-01 17:25:39,407 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:25:39,415 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:25:41,623 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:7	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:26:04,766 - root - [INFO] - 	!!!Scores: {'accuracy': 0.488, 'average': 0.488}
2024-05-01 17:26:04,766 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:26:04,766 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_8/evaluation_runs.json
2024-05-01 17:26:04,766 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:26:04,774 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:26:06,978 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:8	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:26:29,821 - root - [INFO] - 	!!!Scores: {'accuracy': 0.482, 'average': 0.482}
2024-05-01 17:26:29,821 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:26:29,821 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_9/evaluation_runs.json
2024-05-01 17:26:29,821 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:26:29,829 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:26:32,681 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:9	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:26:56,394 - root - [INFO] - 	!!!Scores: {'accuracy': 0.487, 'average': 0.487}
2024-05-01 17:26:56,394 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:26:56,394 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_10/evaluation_runs.json
2024-05-01 17:26:56,394 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:26:56,403 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:26:59,241 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:10	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:27:22,601 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 17:27:22,601 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:27:22,601 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_11/evaluation_runs.json
2024-05-01 17:27:22,601 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:27:22,609 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:27:24,767 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:11	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:27:47,611 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 17:27:47,611 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:27:47,612 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_12/evaluation_runs.json
2024-05-01 17:27:47,612 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:27:47,619 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:27:50,646 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:12	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:28:14,014 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 17:28:14,014 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:28:14,014 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_13/evaluation_runs.json
2024-05-01 17:28:14,014 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:28:14,022 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:28:16,869 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:13	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:28:41,257 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 17:28:41,257 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:28:41,257 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_14/evaluation_runs.json
2024-05-01 17:28:41,257 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:28:41,265 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:28:43,479 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:14	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:29:06,324 - root - [INFO] - 	!!!Scores: {'accuracy': 0.496, 'average': 0.496}
2024-05-01 17:29:06,401 - root - [INFO] - Unexpected keys: []
2024-05-01 17:29:06,637 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:29:06,638 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_0/evaluation_runs.json
2024-05-01 17:29:06,638 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:29:06,646 - root - [INFO] - 		Loading Full Data for rte
2024-05-01 17:29:07,328 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-667c257dd17a5fbb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 17:29:07,351 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:29:07,868 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:0	Num Templates:10	Num Examples with Template:245
2024-05-01 17:29:13,625 - root - [INFO] - 	!!!Scores: {'accuracy': 0.812, 'average': 0.812}
2024-05-01 17:29:13,625 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:29:13,625 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_1/evaluation_runs.json
2024-05-01 17:29:13,625 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:29:13,633 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:29:14,151 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:1	Num Templates:10	Num Examples with Template:245
2024-05-01 17:29:19,601 - root - [INFO] - 	!!!Scores: {'accuracy': 0.796, 'average': 0.796}
2024-05-01 17:29:19,601 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:29:19,601 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_2/evaluation_runs.json
2024-05-01 17:29:19,602 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:29:19,610 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:29:20,129 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:2	Num Templates:10	Num Examples with Template:245
2024-05-01 17:29:25,573 - root - [INFO] - 	!!!Scores: {'accuracy': 0.812, 'average': 0.812}
2024-05-01 17:29:25,574 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:29:25,574 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_3/evaluation_runs.json
2024-05-01 17:29:25,574 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:29:25,581 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:29:26,095 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:3	Num Templates:10	Num Examples with Template:245
2024-05-01 17:29:31,470 - root - [INFO] - 	!!!Scores: {'accuracy': 0.784, 'average': 0.784}
2024-05-01 17:29:31,470 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:29:31,470 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_4/evaluation_runs.json
2024-05-01 17:29:31,470 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:29:31,478 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:29:31,993 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:4	Num Templates:10	Num Examples with Template:245
2024-05-01 17:29:37,441 - root - [INFO] - 	!!!Scores: {'accuracy': 0.804, 'average': 0.804}
2024-05-01 17:29:37,441 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:29:37,441 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_5/evaluation_runs.json
2024-05-01 17:29:37,441 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:29:37,449 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:29:37,967 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:5	Num Templates:10	Num Examples with Template:245
2024-05-01 17:29:43,413 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 17:29:43,414 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:29:43,414 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_6/evaluation_runs.json
2024-05-01 17:29:43,414 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:29:43,421 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:29:43,941 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:6	Num Templates:10	Num Examples with Template:245
2024-05-01 17:29:49,324 - root - [INFO] - 	!!!Scores: {'accuracy': 0.771, 'average': 0.771}
2024-05-01 17:29:49,324 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:29:49,324 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_7/evaluation_runs.json
2024-05-01 17:29:49,324 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:29:49,332 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:29:49,846 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:7	Num Templates:10	Num Examples with Template:245
2024-05-01 17:29:55,430 - root - [INFO] - 	!!!Scores: {'accuracy': 0.776, 'average': 0.776}
2024-05-01 17:29:55,430 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:29:55,430 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_8/evaluation_runs.json
2024-05-01 17:29:55,431 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:29:55,438 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:29:55,952 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:8	Num Templates:10	Num Examples with Template:245
2024-05-01 17:30:01,392 - root - [INFO] - 	!!!Scores: {'accuracy': 0.784, 'average': 0.784}
2024-05-01 17:30:01,392 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:30:01,392 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_9/evaluation_runs.json
2024-05-01 17:30:01,392 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:30:01,400 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:30:01,920 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:9	Num Templates:10	Num Examples with Template:245
2024-05-01 17:30:07,458 - root - [INFO] - 	!!!Scores: {'accuracy': 0.784, 'average': 0.784}
2024-05-01 17:30:07,458 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:30:07,459 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_0/evaluation_runs.json
2024-05-01 17:30:07,459 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:30:07,466 - root - [INFO] - 		Loading Full Data for cb
2024-05-01 17:30:08,393 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-c26e73467e3f215e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 17:30:08,402 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:30:08,466 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:0	Num Templates:15	Num Examples with Template:24
2024-05-01 17:30:09,911 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 17:30:09,911 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:30:09,911 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_1/evaluation_runs.json
2024-05-01 17:30:09,911 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:30:09,919 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:30:09,970 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:1	Num Templates:15	Num Examples with Template:24
2024-05-01 17:30:11,416 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 17:30:11,416 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:30:11,416 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_2/evaluation_runs.json
2024-05-01 17:30:11,416 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:30:11,424 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:30:11,487 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:2	Num Templates:15	Num Examples with Template:24
2024-05-01 17:30:12,956 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 17:30:12,956 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:30:12,956 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_3/evaluation_runs.json
2024-05-01 17:30:12,956 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:30:12,964 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:30:13,015 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:3	Num Templates:15	Num Examples with Template:24
2024-05-01 17:30:14,452 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 17:30:14,452 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:30:14,452 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_4/evaluation_runs.json
2024-05-01 17:30:14,452 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:30:14,459 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:30:14,510 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:4	Num Templates:15	Num Examples with Template:24
2024-05-01 17:30:15,953 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 17:30:15,953 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:30:15,953 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_5/evaluation_runs.json
2024-05-01 17:30:15,953 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:30:15,961 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:30:16,025 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:5	Num Templates:15	Num Examples with Template:24
2024-05-01 17:30:17,476 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 17:30:17,476 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:30:17,476 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_6/evaluation_runs.json
2024-05-01 17:30:17,476 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:30:17,483 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:30:17,535 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:6	Num Templates:15	Num Examples with Template:24
2024-05-01 17:30:18,973 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 17:30:18,973 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:30:18,973 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_7/evaluation_runs.json
2024-05-01 17:30:18,973 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:30:18,981 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:30:19,045 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:7	Num Templates:15	Num Examples with Template:24
2024-05-01 17:30:20,494 - root - [INFO] - 	!!!Scores: {'accuracy': 0.75, 'average': 0.75}
2024-05-01 17:30:20,494 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:30:20,494 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_8/evaluation_runs.json
2024-05-01 17:30:20,494 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:30:20,502 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:30:20,553 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:8	Num Templates:15	Num Examples with Template:24
2024-05-01 17:30:21,997 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 17:30:21,997 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:30:21,997 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_9/evaluation_runs.json
2024-05-01 17:30:21,997 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:30:22,005 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:30:22,056 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:9	Num Templates:15	Num Examples with Template:24
2024-05-01 17:30:23,502 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 17:30:23,502 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:30:23,502 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_10/evaluation_runs.json
2024-05-01 17:30:23,502 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:30:23,510 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:30:23,574 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:10	Num Templates:15	Num Examples with Template:24
2024-05-01 17:30:25,029 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 17:30:25,030 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:30:25,030 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_11/evaluation_runs.json
2024-05-01 17:30:25,030 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:30:25,037 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:30:25,088 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:11	Num Templates:15	Num Examples with Template:24
2024-05-01 17:30:26,532 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 17:30:26,532 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:30:26,532 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_12/evaluation_runs.json
2024-05-01 17:30:26,532 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:30:26,540 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:30:26,591 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:12	Num Templates:15	Num Examples with Template:24
2024-05-01 17:30:28,077 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 17:30:28,077 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:30:28,077 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_13/evaluation_runs.json
2024-05-01 17:30:28,078 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:30:28,085 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:30:28,136 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:13	Num Templates:15	Num Examples with Template:24
2024-05-01 17:30:29,579 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 17:30:29,579 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:30:29,579 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_14/evaluation_runs.json
2024-05-01 17:30:29,579 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:30:29,587 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:30:29,651 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:14	Num Templates:15	Num Examples with Template:24
2024-05-01 17:30:31,130 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 17:30:31,130 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 17:30:31,130 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_0/evaluation_runs.json
2024-05-01 17:30:31,130 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:30:32,092 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-3ae7a9f52719d86a/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 17:30:32,146 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 17:30:35,749 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:0	Num Templates:5	Num Examples with Template:1235
2024-05-01 17:30:45,381 - root - [INFO] - 	!!!Scores: {'accuracy': 0.691, 'average': 0.691}
2024-05-01 17:30:45,381 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 17:30:45,381 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_1/evaluation_runs.json
2024-05-01 17:30:45,381 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:30:45,389 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 17:30:48,936 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:1	Num Templates:5	Num Examples with Template:1235
2024-05-01 17:30:58,651 - root - [INFO] - 	!!!Scores: {'accuracy': 0.671, 'average': 0.671}
2024-05-01 17:30:58,651 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 17:30:58,651 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_2/evaluation_runs.json
2024-05-01 17:30:58,651 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:30:58,659 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 17:31:02,266 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:2	Num Templates:5	Num Examples with Template:1235
2024-05-01 17:31:12,054 - root - [INFO] - 	!!!Scores: {'accuracy': 0.683, 'average': 0.683}
2024-05-01 17:31:12,054 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 17:31:12,054 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_3/evaluation_runs.json
2024-05-01 17:31:12,054 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:31:12,062 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 17:31:15,712 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:3	Num Templates:5	Num Examples with Template:1235
2024-05-01 17:31:25,818 - root - [INFO] - 	!!!Scores: {'accuracy': 0.676, 'average': 0.676}
2024-05-01 17:31:25,819 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 17:31:25,819 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_4/evaluation_runs.json
2024-05-01 17:31:25,819 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:31:25,827 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 17:31:29,425 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:4	Num Templates:5	Num Examples with Template:1235
2024-05-01 17:31:39,410 - root - [INFO] - 	!!!Scores: {'accuracy': 0.674, 'average': 0.674}
2024-05-01 17:31:39,410 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 17:31:39,410 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_0/evaluation_runs.json
2024-05-01 17:31:39,410 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:31:39,418 - root - [INFO] - 		Loading Full Data for wic
2024-05-01 17:31:40,366 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-8f888d9273347dcb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 17:31:40,418 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 17:31:41,908 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:0	Num Templates:10	Num Examples with Template:606
2024-05-01 17:31:47,235 - root - [INFO] - 	!!!Scores: {'accuracy': 0.644, 'average': 0.644}
2024-05-01 17:31:47,235 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 17:31:47,235 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_1/evaluation_runs.json
2024-05-01 17:31:47,236 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:31:47,243 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 17:31:48,734 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:1	Num Templates:10	Num Examples with Template:606
2024-05-01 17:31:53,920 - root - [INFO] - 	!!!Scores: {'accuracy': 0.645, 'average': 0.645}
2024-05-01 17:31:53,920 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 17:31:53,921 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_2/evaluation_runs.json
2024-05-01 17:31:53,921 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:31:53,928 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 17:31:55,424 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:2	Num Templates:10	Num Examples with Template:606
2024-05-01 17:32:01,129 - root - [INFO] - 	!!!Scores: {'accuracy': 0.644, 'average': 0.644}
2024-05-01 17:32:01,130 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 17:32:01,130 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_3/evaluation_runs.json
2024-05-01 17:32:01,130 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:32:01,137 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 17:32:02,630 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:3	Num Templates:10	Num Examples with Template:606
2024-05-01 17:32:08,456 - root - [INFO] - 	!!!Scores: {'accuracy': 0.531, 'average': 0.531}
2024-05-01 17:32:08,456 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 17:32:08,456 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_4/evaluation_runs.json
2024-05-01 17:32:08,456 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:32:08,464 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 17:32:09,942 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:4	Num Templates:10	Num Examples with Template:606
2024-05-01 17:32:15,351 - root - [INFO] - 	!!!Scores: {'accuracy': 0.54, 'average': 0.54}
2024-05-01 17:32:15,351 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 17:32:15,351 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_5/evaluation_runs.json
2024-05-01 17:32:15,351 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:32:15,359 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 17:32:16,854 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:5	Num Templates:10	Num Examples with Template:606
2024-05-01 17:32:22,693 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 17:32:22,693 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 17:32:22,693 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_6/evaluation_runs.json
2024-05-01 17:32:22,694 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:32:22,701 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 17:32:24,191 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:6	Num Templates:10	Num Examples with Template:606
2024-05-01 17:32:29,605 - root - [INFO] - 	!!!Scores: {'accuracy': 0.657, 'average': 0.657}
2024-05-01 17:32:29,605 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 17:32:29,605 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_7/evaluation_runs.json
2024-05-01 17:32:29,606 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:32:29,613 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 17:32:31,092 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:7	Num Templates:10	Num Examples with Template:606
2024-05-01 17:32:36,659 - root - [INFO] - 	!!!Scores: {'accuracy': 0.52, 'average': 0.52}
2024-05-01 17:32:36,659 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 17:32:36,659 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_8/evaluation_runs.json
2024-05-01 17:32:36,659 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:32:36,667 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 17:32:38,161 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:8	Num Templates:10	Num Examples with Template:606
2024-05-01 17:32:44,238 - root - [INFO] - 	!!!Scores: {'accuracy': 0.629, 'average': 0.629}
2024-05-01 17:32:44,238 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 17:32:44,238 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_9/evaluation_runs.json
2024-05-01 17:32:44,238 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:32:44,246 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 17:32:45,720 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:9	Num Templates:10	Num Examples with Template:606
2024-05-01 17:32:50,451 - root - [INFO] - 	!!!Scores: {'accuracy': 0.649, 'average': 0.649}
2024-05-01 17:32:50,451 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 17:32:50,451 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_0/evaluation_runs.json
2024-05-01 17:32:50,451 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:32:50,459 - root - [INFO] - 		Loading Full Data for wsc
2024-05-01 17:32:51,431 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-4bcfdb1f33f6e278/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 17:32:51,448 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 17:32:51,638 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:0	Num Templates:10	Num Examples with Template:72
2024-05-01 17:32:53,428 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 17:32:53,428 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 17:32:53,429 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_1/evaluation_runs.json
2024-05-01 17:32:53,429 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:32:53,436 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 17:32:53,610 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:1	Num Templates:10	Num Examples with Template:72
2024-05-01 17:32:55,370 - root - [INFO] - 	!!!Scores: {'accuracy': 0.542, 'average': 0.542}
2024-05-01 17:32:55,370 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 17:32:55,370 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_2/evaluation_runs.json
2024-05-01 17:32:55,370 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:32:55,377 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 17:32:55,594 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:2	Num Templates:10	Num Examples with Template:72
2024-05-01 17:32:57,518 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 17:32:57,518 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 17:32:57,518 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_3/evaluation_runs.json
2024-05-01 17:32:57,518 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:32:57,526 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 17:32:57,742 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:3	Num Templates:10	Num Examples with Template:72
2024-05-01 17:32:59,671 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 17:32:59,672 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 17:32:59,672 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_4/evaluation_runs.json
2024-05-01 17:32:59,672 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:32:59,680 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 17:32:59,862 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:4	Num Templates:10	Num Examples with Template:72
2024-05-01 17:33:01,614 - root - [INFO] - 	!!!Scores: {'accuracy': 0.556, 'average': 0.556}
2024-05-01 17:33:01,614 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 17:33:01,614 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_5/evaluation_runs.json
2024-05-01 17:33:01,614 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:33:01,622 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 17:33:01,797 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:5	Num Templates:10	Num Examples with Template:72
2024-05-01 17:33:03,596 - root - [INFO] - 	!!!Scores: {'accuracy': 0.611, 'average': 0.611}
2024-05-01 17:33:03,596 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 17:33:03,597 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_6/evaluation_runs.json
2024-05-01 17:33:03,597 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:33:03,604 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 17:33:03,777 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:6	Num Templates:10	Num Examples with Template:72
2024-05-01 17:33:05,588 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 17:33:05,588 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 17:33:05,588 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_7/evaluation_runs.json
2024-05-01 17:33:05,588 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:33:05,596 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 17:33:05,868 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:7	Num Templates:10	Num Examples with Template:72
2024-05-01 17:33:07,638 - root - [INFO] - 	!!!Scores: {'accuracy': 0.528, 'average': 0.528}
2024-05-01 17:33:07,638 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 17:33:07,638 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_8/evaluation_runs.json
2024-05-01 17:33:07,638 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:33:07,646 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 17:33:07,819 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:8	Num Templates:10	Num Examples with Template:72
2024-05-01 17:33:09,616 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 17:33:09,617 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 17:33:09,617 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_9/evaluation_runs.json
2024-05-01 17:33:09,617 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:33:09,625 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 17:33:09,913 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:9	Num Templates:10	Num Examples with Template:72
2024-05-01 17:33:11,674 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 17:33:11,674 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 17:33:11,674 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_0/evaluation_runs.json
2024-05-01 17:33:11,674 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:33:11,682 - root - [INFO] - 		Loading Full Data for copa
2024-05-01 17:33:12,603 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-75ee279101665e7b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 17:33:12,618 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 17:33:12,834 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:0	Num Templates:8	Num Examples with Template:68
2024-05-01 17:33:14,422 - root - [INFO] - 	!!!Scores: {'accuracy': 0.868, 'average': 0.868}
2024-05-01 17:33:14,422 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 17:33:14,423 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_1/evaluation_runs.json
2024-05-01 17:33:14,423 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:33:14,430 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 17:33:14,638 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:1	Num Templates:8	Num Examples with Template:68
2024-05-01 17:33:16,258 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 17:33:16,258 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 17:33:16,258 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_2/evaluation_runs.json
2024-05-01 17:33:16,258 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:33:16,266 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 17:33:16,474 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:2	Num Templates:8	Num Examples with Template:68
2024-05-01 17:33:18,071 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 17:33:18,071 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 17:33:18,071 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_3/evaluation_runs.json
2024-05-01 17:33:18,071 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:33:18,079 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 17:33:18,302 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:3	Num Templates:8	Num Examples with Template:68
2024-05-01 17:33:19,835 - root - [INFO] - 	!!!Scores: {'accuracy': 0.853, 'average': 0.853}
2024-05-01 17:33:19,835 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 17:33:19,835 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_4/evaluation_runs.json
2024-05-01 17:33:19,835 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:33:19,843 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 17:33:20,050 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:4	Num Templates:8	Num Examples with Template:68
2024-05-01 17:33:21,650 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 17:33:21,650 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 17:33:21,650 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_5/evaluation_runs.json
2024-05-01 17:33:21,650 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:33:21,658 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 17:33:21,867 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:5	Num Templates:8	Num Examples with Template:68
2024-05-01 17:33:23,476 - root - [INFO] - 	!!!Scores: {'accuracy': 0.853, 'average': 0.853}
2024-05-01 17:33:23,477 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 17:33:23,477 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_6/evaluation_runs.json
2024-05-01 17:33:23,477 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:33:23,484 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 17:33:23,692 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:6	Num Templates:8	Num Examples with Template:68
2024-05-01 17:33:25,257 - root - [INFO] - 	!!!Scores: {'accuracy': 0.824, 'average': 0.824}
2024-05-01 17:33:25,257 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 17:33:25,257 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_7/evaluation_runs.json
2024-05-01 17:33:25,257 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:33:25,265 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 17:33:25,472 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:7	Num Templates:8	Num Examples with Template:68
2024-05-01 17:33:27,029 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 17:33:27,029 - root - [INFO] - 	Evaluating model on h-swag dataset
2024-05-01 17:33:27,029 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/h-swag_template_0/evaluation_runs.json
2024-05-01 17:33:27,029 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:33:27,043 - root - [INFO] - 		Loading Full Data for h-swag
2024-05-01 17:33:27,992 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-eac19eb96b26d7b6/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 17:33:28,678 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Selected Examples: 10010	Num Total Example:10010
2024-05-01 17:33:50,516 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Num Selected Example with Templates:10010	Template Idx:0	Num Templates:1	Num Examples with Template:10010
2024-05-01 17:39:03,625 - root - [INFO] - 	!!!Scores: {'accuracy': 0.425, 'average': 0.425}
2024-05-01 17:39:03,625 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 17:39:03,625 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_0/evaluation_runs.json
2024-05-01 17:39:03,626 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:39:04,582 - datasets.builder - [WARNING] - Found cached dataset arrow (/home/guodong/.cache/huggingface/datasets/arrow/default-2796e0ea53bfdf30/0.0.0/74f69db2c14c2860059d39860b1f400a03d11bf7fb5a8258ca38c501c878c137)
2024-05-01 17:39:04,695 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 17:39:10,659 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:0	Num Templates:5	Num Examples with Template:1839
2024-05-01 17:39:34,948 - root - [INFO] - 	!!!Scores: {'accuracy': 0.921, 'average': 0.921}
2024-05-01 17:39:34,948 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 17:39:34,948 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_1/evaluation_runs.json
2024-05-01 17:39:34,948 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:39:34,956 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 17:39:40,994 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:1	Num Templates:5	Num Examples with Template:1839
2024-05-01 17:40:05,732 - root - [INFO] - 	!!!Scores: {'accuracy': 0.921, 'average': 0.921}
2024-05-01 17:40:05,732 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 17:40:05,732 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_2/evaluation_runs.json
2024-05-01 17:40:05,732 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:40:05,741 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 17:40:11,908 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:2	Num Templates:5	Num Examples with Template:1839
2024-05-01 17:40:36,495 - root - [INFO] - 	!!!Scores: {'accuracy': 0.927, 'average': 0.927}
2024-05-01 17:40:36,495 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 17:40:36,495 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_3/evaluation_runs.json
2024-05-01 17:40:36,495 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:40:36,504 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 17:40:42,587 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:3	Num Templates:5	Num Examples with Template:1839
2024-05-01 17:41:07,194 - root - [INFO] - 	!!!Scores: {'accuracy': 0.917, 'average': 0.917}
2024-05-01 17:41:07,195 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 17:41:07,195 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_4/evaluation_runs.json
2024-05-01 17:41:07,195 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:41:07,203 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 17:41:13,258 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:4	Num Templates:5	Num Examples with Template:1839
2024-05-01 17:41:38,544 - root - [INFO] - 	!!!Scores: {'accuracy': 0.918, 'average': 0.918}
2024-05-01 17:41:38,544 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:41:38,544 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_0/evaluation_runs.json
2024-05-01 17:41:38,544 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:41:39,507 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-9d580cad42f0d988/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 17:41:39,561 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:41:41,372 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:41:59,839 - root - [INFO] - 	!!!Scores: {'accuracy': 0.642, 'average': 0.642}
2024-05-01 17:41:59,839 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:41:59,839 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_1/evaluation_runs.json
2024-05-01 17:41:59,839 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:41:59,847 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:42:01,690 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:42:18,221 - root - [INFO] - 	!!!Scores: {'accuracy': 0.664, 'average': 0.664}
2024-05-01 17:42:18,221 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:42:18,221 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_2/evaluation_runs.json
2024-05-01 17:42:18,221 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:42:18,229 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:42:20,047 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:42:36,287 - root - [INFO] - 	!!!Scores: {'accuracy': 0.678, 'average': 0.678}
2024-05-01 17:42:36,287 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:42:36,287 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_3/evaluation_runs.json
2024-05-01 17:42:36,288 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:42:36,295 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:42:38,098 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:42:54,517 - root - [INFO] - 	!!!Scores: {'accuracy': 0.666, 'average': 0.666}
2024-05-01 17:42:54,518 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:42:54,518 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_4/evaluation_runs.json
2024-05-01 17:42:54,518 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:42:54,526 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:42:56,326 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:43:13,102 - root - [INFO] - 	!!!Scores: {'accuracy': 0.675, 'average': 0.675}
2024-05-01 17:43:13,102 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:43:13,102 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_5/evaluation_runs.json
2024-05-01 17:43:13,102 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:43:13,111 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:43:14,931 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:43:31,390 - root - [INFO] - 	!!!Scores: {'accuracy': 0.669, 'average': 0.669}
2024-05-01 17:43:31,390 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:43:31,390 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_6/evaluation_runs.json
2024-05-01 17:43:31,390 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:43:31,398 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:43:33,772 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:43:51,888 - root - [INFO] - 	!!!Scores: {'accuracy': 0.663, 'average': 0.663}
2024-05-01 17:43:51,888 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:43:51,888 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_7/evaluation_runs.json
2024-05-01 17:43:51,888 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:43:51,896 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:43:53,736 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:44:10,421 - root - [INFO] - 	!!!Scores: {'accuracy': 0.675, 'average': 0.675}
2024-05-01 17:44:10,421 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:44:10,422 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_8/evaluation_runs.json
2024-05-01 17:44:10,422 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:44:10,429 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:44:12,266 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:44:28,776 - root - [INFO] - 	!!!Scores: {'accuracy': 0.674, 'average': 0.674}
2024-05-01 17:44:28,776 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:44:28,776 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_9/evaluation_runs.json
2024-05-01 17:44:28,776 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:44:28,784 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:44:31,158 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:44:48,312 - root - [INFO] - 	!!!Scores: {'accuracy': 0.631, 'average': 0.631}
2024-05-01 17:44:48,312 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:44:48,312 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_10/evaluation_runs.json
2024-05-01 17:44:48,312 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:44:48,320 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:44:50,686 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:45:07,529 - root - [INFO] - 	!!!Scores: {'accuracy': 0.655, 'average': 0.655}
2024-05-01 17:45:07,529 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:45:07,529 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_11/evaluation_runs.json
2024-05-01 17:45:07,529 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:45:07,538 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:45:09,337 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:45:25,859 - root - [INFO] - 	!!!Scores: {'accuracy': 0.664, 'average': 0.664}
2024-05-01 17:45:25,859 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:45:25,859 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_12/evaluation_runs.json
2024-05-01 17:45:25,859 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:45:25,867 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:45:28,220 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:45:45,082 - root - [INFO] - 	!!!Scores: {'accuracy': 0.632, 'average': 0.632}
2024-05-01 17:45:45,083 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:45:45,083 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_13/evaluation_runs.json
2024-05-01 17:45:45,083 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:45:45,091 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:45:47,435 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:46:05,167 - root - [INFO] - 	!!!Scores: {'accuracy': 0.657, 'average': 0.657}
2024-05-01 17:46:05,167 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 17:46:05,167 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_14/evaluation_runs.json
2024-05-01 17:46:05,167 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:46:05,175 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:46:07,011 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:46:23,533 - root - [INFO] - 	!!!Scores: {'accuracy': 0.669, 'average': 0.669}
2024-05-01 17:46:23,533 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:46:23,533 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_0/evaluation_runs.json
2024-05-01 17:46:23,533 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:46:24,504 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-2f5e309763d7971f/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 17:46:24,557 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:46:26,357 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:46:44,276 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 17:46:44,276 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:46:44,277 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_1/evaluation_runs.json
2024-05-01 17:46:44,277 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:46:44,285 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:46:46,121 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:47:02,363 - root - [INFO] - 	!!!Scores: {'accuracy': 0.52, 'average': 0.52}
2024-05-01 17:47:02,363 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:47:02,363 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_2/evaluation_runs.json
2024-05-01 17:47:02,363 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:47:02,372 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:47:04,187 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:47:20,040 - root - [INFO] - 	!!!Scores: {'accuracy': 0.518, 'average': 0.518}
2024-05-01 17:47:20,040 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:47:20,040 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_3/evaluation_runs.json
2024-05-01 17:47:20,040 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:47:20,048 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:47:21,847 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:47:37,890 - root - [INFO] - 	!!!Scores: {'accuracy': 0.504, 'average': 0.504}
2024-05-01 17:47:37,890 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:47:37,890 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_4/evaluation_runs.json
2024-05-01 17:47:37,890 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:47:37,898 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:47:39,693 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:47:56,146 - root - [INFO] - 	!!!Scores: {'accuracy': 0.513, 'average': 0.513}
2024-05-01 17:47:56,147 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:47:56,147 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_5/evaluation_runs.json
2024-05-01 17:47:56,147 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:47:56,155 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:47:57,977 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:48:14,080 - root - [INFO] - 	!!!Scores: {'accuracy': 0.521, 'average': 0.521}
2024-05-01 17:48:14,080 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:48:14,080 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_6/evaluation_runs.json
2024-05-01 17:48:14,080 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:48:14,088 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:48:16,459 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:48:34,099 - root - [INFO] - 	!!!Scores: {'accuracy': 0.517, 'average': 0.517}
2024-05-01 17:48:34,099 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:48:34,099 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_7/evaluation_runs.json
2024-05-01 17:48:34,099 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:48:34,107 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:48:35,947 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:48:52,335 - root - [INFO] - 	!!!Scores: {'accuracy': 0.522, 'average': 0.522}
2024-05-01 17:48:52,335 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:48:52,335 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_8/evaluation_runs.json
2024-05-01 17:48:52,335 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:48:52,343 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:48:54,183 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:49:10,417 - root - [INFO] - 	!!!Scores: {'accuracy': 0.517, 'average': 0.517}
2024-05-01 17:49:10,417 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:49:10,417 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_9/evaluation_runs.json
2024-05-01 17:49:10,417 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:49:10,426 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:49:12,797 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:49:29,646 - root - [INFO] - 	!!!Scores: {'accuracy': 0.506, 'average': 0.506}
2024-05-01 17:49:29,646 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:49:29,646 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_10/evaluation_runs.json
2024-05-01 17:49:29,646 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:49:29,654 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:49:32,018 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:49:48,576 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 17:49:48,576 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:49:48,576 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_11/evaluation_runs.json
2024-05-01 17:49:48,576 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:49:48,584 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:49:50,379 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:50:06,619 - root - [INFO] - 	!!!Scores: {'accuracy': 0.517, 'average': 0.517}
2024-05-01 17:50:06,619 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:50:06,619 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_12/evaluation_runs.json
2024-05-01 17:50:06,619 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:50:06,627 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:50:08,979 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:50:25,512 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 17:50:25,512 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:50:25,512 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_13/evaluation_runs.json
2024-05-01 17:50:25,512 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:50:25,520 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:50:27,861 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:50:45,224 - root - [INFO] - 	!!!Scores: {'accuracy': 0.498, 'average': 0.498}
2024-05-01 17:50:45,224 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 17:50:45,224 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_14/evaluation_runs.json
2024-05-01 17:50:45,224 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:50:45,232 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 17:50:47,067 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 17:51:03,300 - root - [INFO] - 	!!!Scores: {'accuracy': 0.514, 'average': 0.514}
2024-05-01 17:51:03,300 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:51:03,300 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_0/evaluation_runs.json
2024-05-01 17:51:03,300 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:51:04,241 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-5854c73e8e2a58bf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 17:51:04,300 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:51:06,458 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:0	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:51:31,733 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 17:51:31,733 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:51:31,733 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_1/evaluation_runs.json
2024-05-01 17:51:31,733 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:51:31,741 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:51:33,945 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:1	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:51:56,788 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 17:51:56,788 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:51:56,788 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_2/evaluation_runs.json
2024-05-01 17:51:56,789 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:51:56,797 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:51:58,974 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:2	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:52:21,443 - root - [INFO] - 	!!!Scores: {'accuracy': 0.486, 'average': 0.486}
2024-05-01 17:52:21,443 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:52:21,443 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_3/evaluation_runs.json
2024-05-01 17:52:21,443 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:52:21,451 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:52:23,609 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:3	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:52:46,280 - root - [INFO] - 	!!!Scores: {'accuracy': 0.508, 'average': 0.508}
2024-05-01 17:52:46,280 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:52:46,280 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_4/evaluation_runs.json
2024-05-01 17:52:46,280 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:52:46,288 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:52:48,441 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:4	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:53:11,666 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 17:53:11,666 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:53:11,666 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_5/evaluation_runs.json
2024-05-01 17:53:11,666 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:53:11,674 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:53:14,071 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:5	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:53:36,810 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 17:53:36,810 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:53:36,810 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_6/evaluation_runs.json
2024-05-01 17:53:36,810 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:53:36,818 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:53:39,668 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:6	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:54:04,519 - root - [INFO] - 	!!!Scores: {'accuracy': 0.495, 'average': 0.495}
2024-05-01 17:54:04,519 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:54:04,519 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_7/evaluation_runs.json
2024-05-01 17:54:04,519 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:54:04,528 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:54:06,737 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:7	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:54:29,862 - root - [INFO] - 	!!!Scores: {'accuracy': 0.477, 'average': 0.477}
2024-05-01 17:54:29,862 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:54:29,862 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_8/evaluation_runs.json
2024-05-01 17:54:29,863 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:54:29,871 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:54:32,082 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:8	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:54:54,912 - root - [INFO] - 	!!!Scores: {'accuracy': 0.484, 'average': 0.484}
2024-05-01 17:54:54,912 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:54:54,912 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_9/evaluation_runs.json
2024-05-01 17:54:54,912 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:54:54,920 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:54:57,769 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:9	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:55:21,468 - root - [INFO] - 	!!!Scores: {'accuracy': 0.494, 'average': 0.494}
2024-05-01 17:55:21,468 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:55:21,468 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_10/evaluation_runs.json
2024-05-01 17:55:21,468 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:55:21,476 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:55:24,315 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:10	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:55:47,666 - root - [INFO] - 	!!!Scores: {'accuracy': 0.498, 'average': 0.498}
2024-05-01 17:55:47,667 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:55:47,667 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_11/evaluation_runs.json
2024-05-01 17:55:47,667 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:55:47,676 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:55:49,837 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:11	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:56:12,670 - root - [INFO] - 	!!!Scores: {'accuracy': 0.494, 'average': 0.494}
2024-05-01 17:56:12,670 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:56:12,670 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_12/evaluation_runs.json
2024-05-01 17:56:12,670 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:56:12,678 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:56:15,503 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:12	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:56:38,855 - root - [INFO] - 	!!!Scores: {'accuracy': 0.514, 'average': 0.514}
2024-05-01 17:56:38,855 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:56:38,855 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_13/evaluation_runs.json
2024-05-01 17:56:38,855 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:56:38,863 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:56:41,687 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:13	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:57:06,066 - root - [INFO] - 	!!!Scores: {'accuracy': 0.505, 'average': 0.505}
2024-05-01 17:57:06,066 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 17:57:06,066 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_14/evaluation_runs.json
2024-05-01 17:57:06,066 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:57:06,075 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 17:57:08,294 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:14	Num Templates:15	Num Examples with Template:1200
2024-05-01 17:57:31,132 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 17:57:31,209 - root - [INFO] - Unexpected keys: []
2024-05-01 17:57:31,439 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:57:31,439 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_0/evaluation_runs.json
2024-05-01 17:57:31,439 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:57:31,447 - root - [INFO] - 		Loading Full Data for rte
2024-05-01 17:57:32,147 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-667c257dd17a5fbb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 17:57:32,169 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:57:32,684 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:0	Num Templates:10	Num Examples with Template:245
2024-05-01 17:57:38,440 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 17:57:38,440 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:57:38,440 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_1/evaluation_runs.json
2024-05-01 17:57:38,440 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:57:38,447 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:57:38,965 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:1	Num Templates:10	Num Examples with Template:245
2024-05-01 17:57:44,409 - root - [INFO] - 	!!!Scores: {'accuracy': 0.796, 'average': 0.796}
2024-05-01 17:57:44,409 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:57:44,409 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_2/evaluation_runs.json
2024-05-01 17:57:44,409 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:57:44,416 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:57:44,933 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:2	Num Templates:10	Num Examples with Template:245
2024-05-01 17:57:50,377 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 17:57:50,377 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:57:50,377 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_3/evaluation_runs.json
2024-05-01 17:57:50,377 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:57:50,384 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:57:50,898 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:3	Num Templates:10	Num Examples with Template:245
2024-05-01 17:57:56,269 - root - [INFO] - 	!!!Scores: {'accuracy': 0.784, 'average': 0.784}
2024-05-01 17:57:56,269 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:57:56,269 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_4/evaluation_runs.json
2024-05-01 17:57:56,269 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:57:56,276 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:57:56,790 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:4	Num Templates:10	Num Examples with Template:245
2024-05-01 17:58:02,231 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 17:58:02,231 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:58:02,231 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_5/evaluation_runs.json
2024-05-01 17:58:02,231 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:58:02,239 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:58:02,757 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:5	Num Templates:10	Num Examples with Template:245
2024-05-01 17:58:08,202 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 17:58:08,202 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:58:08,202 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_6/evaluation_runs.json
2024-05-01 17:58:08,202 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:58:08,210 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:58:08,727 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:6	Num Templates:10	Num Examples with Template:245
2024-05-01 17:58:14,107 - root - [INFO] - 	!!!Scores: {'accuracy': 0.78, 'average': 0.78}
2024-05-01 17:58:14,107 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:58:14,107 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_7/evaluation_runs.json
2024-05-01 17:58:14,107 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:58:14,115 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:58:14,628 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:7	Num Templates:10	Num Examples with Template:245
2024-05-01 17:58:20,208 - root - [INFO] - 	!!!Scores: {'accuracy': 0.776, 'average': 0.776}
2024-05-01 17:58:20,208 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:58:20,208 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_8/evaluation_runs.json
2024-05-01 17:58:20,209 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:58:20,216 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:58:20,729 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:8	Num Templates:10	Num Examples with Template:245
2024-05-01 17:58:26,168 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 17:58:26,168 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 17:58:26,168 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_9/evaluation_runs.json
2024-05-01 17:58:26,168 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:58:26,175 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 17:58:26,694 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:9	Num Templates:10	Num Examples with Template:245
2024-05-01 17:58:32,233 - root - [INFO] - 	!!!Scores: {'accuracy': 0.8, 'average': 0.8}
2024-05-01 17:58:32,233 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:58:32,233 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_0/evaluation_runs.json
2024-05-01 17:58:32,234 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:58:32,241 - root - [INFO] - 		Loading Full Data for cb
2024-05-01 17:58:33,159 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-c26e73467e3f215e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 17:58:33,168 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:58:33,233 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:0	Num Templates:15	Num Examples with Template:24
2024-05-01 17:58:34,675 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 17:58:34,675 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:58:34,676 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_1/evaluation_runs.json
2024-05-01 17:58:34,676 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:58:34,683 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:58:34,734 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:1	Num Templates:15	Num Examples with Template:24
2024-05-01 17:58:36,180 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 17:58:36,180 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:58:36,180 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_2/evaluation_runs.json
2024-05-01 17:58:36,181 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:58:36,188 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:58:36,252 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:2	Num Templates:15	Num Examples with Template:24
2024-05-01 17:58:37,718 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 17:58:37,718 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:58:37,718 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_3/evaluation_runs.json
2024-05-01 17:58:37,718 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:58:37,725 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:58:37,776 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:3	Num Templates:15	Num Examples with Template:24
2024-05-01 17:58:39,210 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 17:58:39,211 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:58:39,211 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_4/evaluation_runs.json
2024-05-01 17:58:39,211 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:58:39,218 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:58:39,268 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:4	Num Templates:15	Num Examples with Template:24
2024-05-01 17:58:40,708 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 17:58:40,708 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:58:40,708 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_5/evaluation_runs.json
2024-05-01 17:58:40,708 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:58:40,715 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:58:40,779 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:5	Num Templates:15	Num Examples with Template:24
2024-05-01 17:58:42,228 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 17:58:42,228 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:58:42,228 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_6/evaluation_runs.json
2024-05-01 17:58:42,229 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:58:42,236 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:58:42,286 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:6	Num Templates:15	Num Examples with Template:24
2024-05-01 17:58:43,723 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 17:58:43,723 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:58:43,723 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_7/evaluation_runs.json
2024-05-01 17:58:43,723 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:58:43,730 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:58:43,794 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:7	Num Templates:15	Num Examples with Template:24
2024-05-01 17:58:45,244 - root - [INFO] - 	!!!Scores: {'accuracy': 0.75, 'average': 0.75}
2024-05-01 17:58:45,244 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:58:45,244 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_8/evaluation_runs.json
2024-05-01 17:58:45,244 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:58:45,251 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:58:45,302 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:8	Num Templates:15	Num Examples with Template:24
2024-05-01 17:58:46,745 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 17:58:46,746 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:58:46,746 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_9/evaluation_runs.json
2024-05-01 17:58:46,746 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:58:46,753 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:58:46,804 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:9	Num Templates:15	Num Examples with Template:24
2024-05-01 17:58:48,250 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 17:58:48,250 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:58:48,250 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_10/evaluation_runs.json
2024-05-01 17:58:48,250 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:58:48,258 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:58:48,322 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:10	Num Templates:15	Num Examples with Template:24
2024-05-01 17:58:49,778 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 17:58:49,778 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:58:49,778 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_11/evaluation_runs.json
2024-05-01 17:58:49,778 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:58:49,786 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:58:49,836 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:11	Num Templates:15	Num Examples with Template:24
2024-05-01 17:58:51,279 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 17:58:51,279 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:58:51,279 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_12/evaluation_runs.json
2024-05-01 17:58:51,279 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:58:51,287 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:58:51,337 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:12	Num Templates:15	Num Examples with Template:24
2024-05-01 17:58:52,825 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 17:58:52,825 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:58:52,825 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_13/evaluation_runs.json
2024-05-01 17:58:52,825 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:58:52,832 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:58:52,883 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:13	Num Templates:15	Num Examples with Template:24
2024-05-01 17:58:54,324 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 17:58:54,324 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 17:58:54,325 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_14/evaluation_runs.json
2024-05-01 17:58:54,325 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:58:54,332 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 17:58:54,396 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:14	Num Templates:15	Num Examples with Template:24
2024-05-01 17:58:55,875 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 17:58:55,876 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 17:58:55,876 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_0/evaluation_runs.json
2024-05-01 17:58:55,876 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:58:56,799 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-3ae7a9f52719d86a/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 17:58:56,855 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 17:59:00,450 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:0	Num Templates:5	Num Examples with Template:1235
2024-05-01 17:59:10,079 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 17:59:10,079 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 17:59:10,079 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_1/evaluation_runs.json
2024-05-01 17:59:10,079 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:59:10,087 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 17:59:13,611 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:1	Num Templates:5	Num Examples with Template:1235
2024-05-01 17:59:23,324 - root - [INFO] - 	!!!Scores: {'accuracy': 0.672, 'average': 0.672}
2024-05-01 17:59:23,324 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 17:59:23,324 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_2/evaluation_runs.json
2024-05-01 17:59:23,324 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:59:23,333 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 17:59:26,925 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:2	Num Templates:5	Num Examples with Template:1235
2024-05-01 17:59:36,710 - root - [INFO] - 	!!!Scores: {'accuracy': 0.679, 'average': 0.679}
2024-05-01 17:59:36,710 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 17:59:36,710 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_3/evaluation_runs.json
2024-05-01 17:59:36,710 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:59:36,719 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 17:59:40,337 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:3	Num Templates:5	Num Examples with Template:1235
2024-05-01 17:59:50,442 - root - [INFO] - 	!!!Scores: {'accuracy': 0.672, 'average': 0.672}
2024-05-01 17:59:50,443 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 17:59:50,443 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_4/evaluation_runs.json
2024-05-01 17:59:50,443 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 17:59:50,450 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 17:59:54,042 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:4	Num Templates:5	Num Examples with Template:1235
2024-05-01 18:00:04,019 - root - [INFO] - 	!!!Scores: {'accuracy': 0.67, 'average': 0.67}
2024-05-01 18:00:04,020 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 18:00:04,020 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_0/evaluation_runs.json
2024-05-01 18:00:04,020 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:00:04,027 - root - [INFO] - 		Loading Full Data for wic
2024-05-01 18:00:05,010 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-8f888d9273347dcb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 18:00:05,063 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 18:00:06,549 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:0	Num Templates:10	Num Examples with Template:606
2024-05-01 18:00:11,871 - root - [INFO] - 	!!!Scores: {'accuracy': 0.645, 'average': 0.645}
2024-05-01 18:00:11,871 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 18:00:11,871 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_1/evaluation_runs.json
2024-05-01 18:00:11,871 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:00:11,879 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 18:00:13,365 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:1	Num Templates:10	Num Examples with Template:606
2024-05-01 18:00:18,549 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 18:00:18,549 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 18:00:18,549 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_2/evaluation_runs.json
2024-05-01 18:00:18,550 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:00:18,557 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 18:00:20,048 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:2	Num Templates:10	Num Examples with Template:606
2024-05-01 18:00:25,746 - root - [INFO] - 	!!!Scores: {'accuracy': 0.642, 'average': 0.642}
2024-05-01 18:00:25,746 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 18:00:25,746 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_3/evaluation_runs.json
2024-05-01 18:00:25,746 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:00:25,754 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 18:00:27,243 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:3	Num Templates:10	Num Examples with Template:606
2024-05-01 18:00:33,069 - root - [INFO] - 	!!!Scores: {'accuracy': 0.535, 'average': 0.535}
2024-05-01 18:00:33,069 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 18:00:33,069 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_4/evaluation_runs.json
2024-05-01 18:00:33,070 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:00:33,077 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 18:00:34,553 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:4	Num Templates:10	Num Examples with Template:606
2024-05-01 18:00:39,961 - root - [INFO] - 	!!!Scores: {'accuracy': 0.54, 'average': 0.54}
2024-05-01 18:00:39,961 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 18:00:39,961 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_5/evaluation_runs.json
2024-05-01 18:00:39,961 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:00:39,969 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 18:00:41,461 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:5	Num Templates:10	Num Examples with Template:606
2024-05-01 18:00:47,294 - root - [INFO] - 	!!!Scores: {'accuracy': 0.645, 'average': 0.645}
2024-05-01 18:00:47,294 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 18:00:47,294 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_6/evaluation_runs.json
2024-05-01 18:00:47,294 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:00:47,302 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 18:00:48,811 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:6	Num Templates:10	Num Examples with Template:606
2024-05-01 18:00:54,217 - root - [INFO] - 	!!!Scores: {'accuracy': 0.66, 'average': 0.66}
2024-05-01 18:00:54,217 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 18:00:54,218 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_7/evaluation_runs.json
2024-05-01 18:00:54,218 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:00:54,225 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 18:00:55,714 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:7	Num Templates:10	Num Examples with Template:606
2024-05-01 18:01:01,277 - root - [INFO] - 	!!!Scores: {'accuracy': 0.518, 'average': 0.518}
2024-05-01 18:01:01,277 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 18:01:01,277 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_8/evaluation_runs.json
2024-05-01 18:01:01,277 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:01:01,285 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 18:01:02,777 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:8	Num Templates:10	Num Examples with Template:606
2024-05-01 18:01:08,853 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 18:01:08,853 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 18:01:08,853 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_9/evaluation_runs.json
2024-05-01 18:01:08,853 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:01:08,862 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 18:01:10,335 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:9	Num Templates:10	Num Examples with Template:606
2024-05-01 18:01:15,060 - root - [INFO] - 	!!!Scores: {'accuracy': 0.647, 'average': 0.647}
2024-05-01 18:01:15,060 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 18:01:15,060 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_0/evaluation_runs.json
2024-05-01 18:01:15,060 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:01:15,068 - root - [INFO] - 		Loading Full Data for wsc
2024-05-01 18:01:15,975 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-4bcfdb1f33f6e278/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 18:01:15,990 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 18:01:16,179 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:0	Num Templates:10	Num Examples with Template:72
2024-05-01 18:01:17,969 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 18:01:17,969 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 18:01:17,969 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_1/evaluation_runs.json
2024-05-01 18:01:17,970 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:01:17,977 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 18:01:18,149 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:1	Num Templates:10	Num Examples with Template:72
2024-05-01 18:01:19,909 - root - [INFO] - 	!!!Scores: {'accuracy': 0.556, 'average': 0.556}
2024-05-01 18:01:19,909 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 18:01:19,909 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_2/evaluation_runs.json
2024-05-01 18:01:19,909 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:01:19,916 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 18:01:20,132 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:2	Num Templates:10	Num Examples with Template:72
2024-05-01 18:01:22,055 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 18:01:22,055 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 18:01:22,055 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_3/evaluation_runs.json
2024-05-01 18:01:22,055 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:01:22,062 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 18:01:22,277 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:3	Num Templates:10	Num Examples with Template:72
2024-05-01 18:01:24,204 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 18:01:24,204 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 18:01:24,204 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_4/evaluation_runs.json
2024-05-01 18:01:24,204 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:01:24,211 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 18:01:24,393 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:4	Num Templates:10	Num Examples with Template:72
2024-05-01 18:01:26,143 - root - [INFO] - 	!!!Scores: {'accuracy': 0.556, 'average': 0.556}
2024-05-01 18:01:26,143 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 18:01:26,143 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_5/evaluation_runs.json
2024-05-01 18:01:26,143 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:01:26,150 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 18:01:26,325 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:5	Num Templates:10	Num Examples with Template:72
2024-05-01 18:01:28,124 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 18:01:28,124 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 18:01:28,124 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_6/evaluation_runs.json
2024-05-01 18:01:28,124 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:01:28,131 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 18:01:28,304 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:6	Num Templates:10	Num Examples with Template:72
2024-05-01 18:01:30,115 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 18:01:30,115 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 18:01:30,115 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_7/evaluation_runs.json
2024-05-01 18:01:30,115 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:01:30,122 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 18:01:30,395 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:7	Num Templates:10	Num Examples with Template:72
2024-05-01 18:01:32,164 - root - [INFO] - 	!!!Scores: {'accuracy': 0.514, 'average': 0.514}
2024-05-01 18:01:32,164 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 18:01:32,164 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_8/evaluation_runs.json
2024-05-01 18:01:32,164 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:01:32,172 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 18:01:32,345 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:8	Num Templates:10	Num Examples with Template:72
2024-05-01 18:01:34,140 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 18:01:34,140 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 18:01:34,140 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_9/evaluation_runs.json
2024-05-01 18:01:34,141 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:01:34,148 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 18:01:34,436 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:9	Num Templates:10	Num Examples with Template:72
2024-05-01 18:01:36,195 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 18:01:36,195 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 18:01:36,195 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_0/evaluation_runs.json
2024-05-01 18:01:36,195 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:01:36,203 - root - [INFO] - 		Loading Full Data for copa
2024-05-01 18:01:37,110 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-75ee279101665e7b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 18:01:37,125 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 18:01:37,343 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:0	Num Templates:8	Num Examples with Template:68
2024-05-01 18:01:38,928 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 18:01:38,928 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 18:01:38,928 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_1/evaluation_runs.json
2024-05-01 18:01:38,929 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:01:38,936 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 18:01:39,144 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:1	Num Templates:8	Num Examples with Template:68
2024-05-01 18:01:40,763 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 18:01:40,763 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 18:01:40,764 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_2/evaluation_runs.json
2024-05-01 18:01:40,764 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:01:40,771 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 18:01:40,980 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:2	Num Templates:8	Num Examples with Template:68
2024-05-01 18:01:42,574 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 18:01:42,574 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 18:01:42,575 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_3/evaluation_runs.json
2024-05-01 18:01:42,575 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:01:42,582 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 18:01:42,804 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:3	Num Templates:8	Num Examples with Template:68
2024-05-01 18:01:44,337 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 18:01:44,337 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 18:01:44,337 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_4/evaluation_runs.json
2024-05-01 18:01:44,337 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:01:44,344 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 18:01:44,552 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:4	Num Templates:8	Num Examples with Template:68
2024-05-01 18:01:46,150 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 18:01:46,150 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 18:01:46,150 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_5/evaluation_runs.json
2024-05-01 18:01:46,150 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:01:46,157 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 18:01:46,367 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:5	Num Templates:8	Num Examples with Template:68
2024-05-01 18:01:47,975 - root - [INFO] - 	!!!Scores: {'accuracy': 0.853, 'average': 0.853}
2024-05-01 18:01:47,975 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 18:01:47,975 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_6/evaluation_runs.json
2024-05-01 18:01:47,975 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:01:47,982 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 18:01:48,190 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:6	Num Templates:8	Num Examples with Template:68
2024-05-01 18:01:49,750 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 18:01:49,750 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 18:01:49,750 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_7/evaluation_runs.json
2024-05-01 18:01:49,750 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:01:49,757 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 18:01:49,965 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:7	Num Templates:8	Num Examples with Template:68
2024-05-01 18:01:51,517 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 18:01:51,517 - root - [INFO] - 	Evaluating model on h-swag dataset
2024-05-01 18:01:51,517 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/h-swag_template_0/evaluation_runs.json
2024-05-01 18:01:51,517 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:01:51,531 - root - [INFO] - 		Loading Full Data for h-swag
2024-05-01 18:01:52,434 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-eac19eb96b26d7b6/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 18:01:53,120 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Selected Examples: 10010	Num Total Example:10010
2024-05-01 18:02:15,137 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Num Selected Example with Templates:10010	Template Idx:0	Num Templates:1	Num Examples with Template:10010
2024-05-01 18:07:28,238 - root - [INFO] - 	!!!Scores: {'accuracy': 0.426, 'average': 0.426}
2024-05-01 18:07:28,238 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 18:07:28,238 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_0/evaluation_runs.json
2024-05-01 18:07:28,239 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:07:29,162 - datasets.builder - [WARNING] - Found cached dataset arrow (/home/guodong/.cache/huggingface/datasets/arrow/default-2796e0ea53bfdf30/0.0.0/74f69db2c14c2860059d39860b1f400a03d11bf7fb5a8258ca38c501c878c137)
2024-05-01 18:07:29,275 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 18:07:35,258 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:0	Num Templates:5	Num Examples with Template:1839
2024-05-01 18:07:59,554 - root - [INFO] - 	!!!Scores: {'accuracy': 0.921, 'average': 0.921}
2024-05-01 18:07:59,554 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 18:07:59,555 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_1/evaluation_runs.json
2024-05-01 18:07:59,555 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:07:59,563 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 18:08:05,615 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:1	Num Templates:5	Num Examples with Template:1839
2024-05-01 18:08:30,355 - root - [INFO] - 	!!!Scores: {'accuracy': 0.922, 'average': 0.922}
2024-05-01 18:08:30,355 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 18:08:30,355 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_2/evaluation_runs.json
2024-05-01 18:08:30,355 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:08:30,365 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 18:08:36,387 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:2	Num Templates:5	Num Examples with Template:1839
2024-05-01 18:09:00,965 - root - [INFO] - 	!!!Scores: {'accuracy': 0.926, 'average': 0.926}
2024-05-01 18:09:00,965 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 18:09:00,965 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_3/evaluation_runs.json
2024-05-01 18:09:00,965 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:09:00,973 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 18:09:07,037 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:3	Num Templates:5	Num Examples with Template:1839
2024-05-01 18:09:31,653 - root - [INFO] - 	!!!Scores: {'accuracy': 0.915, 'average': 0.915}
2024-05-01 18:09:31,654 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 18:09:31,654 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_4/evaluation_runs.json
2024-05-01 18:09:31,654 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:09:31,662 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 18:09:37,713 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:4	Num Templates:5	Num Examples with Template:1839
2024-05-01 18:10:03,006 - root - [INFO] - 	!!!Scores: {'accuracy': 0.918, 'average': 0.918}
2024-05-01 18:10:03,006 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 18:10:03,006 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_0/evaluation_runs.json
2024-05-01 18:10:03,007 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:10:03,711 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-9d580cad42f0d988/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 18:10:03,763 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:10:05,575 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:10:24,049 - root - [INFO] - 	!!!Scores: {'accuracy': 0.642, 'average': 0.642}
2024-05-01 18:10:24,049 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 18:10:24,049 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_1/evaluation_runs.json
2024-05-01 18:10:24,049 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:10:24,058 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:10:25,897 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:10:42,434 - root - [INFO] - 	!!!Scores: {'accuracy': 0.673, 'average': 0.673}
2024-05-01 18:10:42,434 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 18:10:42,434 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_2/evaluation_runs.json
2024-05-01 18:10:42,434 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:10:42,442 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:10:44,258 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:11:00,518 - root - [INFO] - 	!!!Scores: {'accuracy': 0.678, 'average': 0.678}
2024-05-01 18:11:00,519 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 18:11:00,519 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_3/evaluation_runs.json
2024-05-01 18:11:00,519 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:11:00,527 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:11:02,326 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:11:18,753 - root - [INFO] - 	!!!Scores: {'accuracy': 0.668, 'average': 0.668}
2024-05-01 18:11:18,754 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 18:11:18,754 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_4/evaluation_runs.json
2024-05-01 18:11:18,754 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:11:18,762 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:11:20,558 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:11:37,332 - root - [INFO] - 	!!!Scores: {'accuracy': 0.678, 'average': 0.678}
2024-05-01 18:11:37,332 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 18:11:37,332 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_5/evaluation_runs.json
2024-05-01 18:11:37,332 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:11:37,340 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:11:39,165 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:11:55,625 - root - [INFO] - 	!!!Scores: {'accuracy': 0.674, 'average': 0.674}
2024-05-01 18:11:55,625 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 18:11:55,625 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_6/evaluation_runs.json
2024-05-01 18:11:55,625 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:11:55,633 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:11:58,006 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:12:16,133 - root - [INFO] - 	!!!Scores: {'accuracy': 0.666, 'average': 0.666}
2024-05-01 18:12:16,133 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 18:12:16,133 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_7/evaluation_runs.json
2024-05-01 18:12:16,133 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:12:16,141 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:12:17,983 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:12:34,681 - root - [INFO] - 	!!!Scores: {'accuracy': 0.674, 'average': 0.674}
2024-05-01 18:12:34,681 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 18:12:34,681 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_8/evaluation_runs.json
2024-05-01 18:12:34,681 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:12:34,689 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:12:36,525 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:12:53,046 - root - [INFO] - 	!!!Scores: {'accuracy': 0.675, 'average': 0.675}
2024-05-01 18:12:53,046 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 18:12:53,046 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_9/evaluation_runs.json
2024-05-01 18:12:53,046 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:12:53,055 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:12:55,434 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:13:12,597 - root - [INFO] - 	!!!Scores: {'accuracy': 0.636, 'average': 0.636}
2024-05-01 18:13:12,597 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 18:13:12,597 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_10/evaluation_runs.json
2024-05-01 18:13:12,597 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:13:12,605 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:13:14,980 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:13:31,823 - root - [INFO] - 	!!!Scores: {'accuracy': 0.661, 'average': 0.661}
2024-05-01 18:13:31,823 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 18:13:31,823 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_11/evaluation_runs.json
2024-05-01 18:13:31,823 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:13:31,831 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:13:33,636 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:13:50,163 - root - [INFO] - 	!!!Scores: {'accuracy': 0.668, 'average': 0.668}
2024-05-01 18:13:50,163 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 18:13:50,163 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_12/evaluation_runs.json
2024-05-01 18:13:50,163 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:13:50,171 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:13:52,523 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:14:09,390 - root - [INFO] - 	!!!Scores: {'accuracy': 0.637, 'average': 0.637}
2024-05-01 18:14:09,391 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 18:14:09,391 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_13/evaluation_runs.json
2024-05-01 18:14:09,391 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:14:09,399 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:14:11,742 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:14:29,472 - root - [INFO] - 	!!!Scores: {'accuracy': 0.661, 'average': 0.661}
2024-05-01 18:14:29,472 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 18:14:29,472 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_14/evaluation_runs.json
2024-05-01 18:14:29,472 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:14:29,480 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:14:31,324 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:14:47,847 - root - [INFO] - 	!!!Scores: {'accuracy': 0.671, 'average': 0.671}
2024-05-01 18:14:47,847 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 18:14:47,847 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_0/evaluation_runs.json
2024-05-01 18:14:47,848 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:14:48,758 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-2f5e309763d7971f/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 18:14:48,809 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:14:50,606 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:15:08,523 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 18:15:08,523 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 18:15:08,523 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_1/evaluation_runs.json
2024-05-01 18:15:08,523 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:15:08,531 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:15:10,368 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:15:26,615 - root - [INFO] - 	!!!Scores: {'accuracy': 0.515, 'average': 0.515}
2024-05-01 18:15:26,615 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 18:15:26,615 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_2/evaluation_runs.json
2024-05-01 18:15:26,615 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:15:26,624 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:15:28,437 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:15:44,287 - root - [INFO] - 	!!!Scores: {'accuracy': 0.519, 'average': 0.519}
2024-05-01 18:15:44,287 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 18:15:44,287 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_3/evaluation_runs.json
2024-05-01 18:15:44,287 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:15:44,295 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:15:46,093 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:16:02,139 - root - [INFO] - 	!!!Scores: {'accuracy': 0.501, 'average': 0.501}
2024-05-01 18:16:02,140 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 18:16:02,140 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_4/evaluation_runs.json
2024-05-01 18:16:02,140 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:16:02,148 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:16:03,942 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:16:20,399 - root - [INFO] - 	!!!Scores: {'accuracy': 0.515, 'average': 0.515}
2024-05-01 18:16:20,399 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 18:16:20,399 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_5/evaluation_runs.json
2024-05-01 18:16:20,399 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:16:20,407 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:16:22,219 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:16:38,326 - root - [INFO] - 	!!!Scores: {'accuracy': 0.519, 'average': 0.519}
2024-05-01 18:16:38,326 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 18:16:38,326 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_6/evaluation_runs.json
2024-05-01 18:16:38,326 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:16:38,334 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:16:40,903 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:16:58,546 - root - [INFO] - 	!!!Scores: {'accuracy': 0.521, 'average': 0.521}
2024-05-01 18:16:58,546 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 18:16:58,546 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_7/evaluation_runs.json
2024-05-01 18:16:58,547 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:16:58,554 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:17:00,402 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:17:16,794 - root - [INFO] - 	!!!Scores: {'accuracy': 0.519, 'average': 0.519}
2024-05-01 18:17:16,794 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 18:17:16,794 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_8/evaluation_runs.json
2024-05-01 18:17:16,794 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:17:16,802 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:17:18,644 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:17:34,888 - root - [INFO] - 	!!!Scores: {'accuracy': 0.52, 'average': 0.52}
2024-05-01 18:17:34,888 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 18:17:34,888 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_9/evaluation_runs.json
2024-05-01 18:17:34,888 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:17:34,896 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:17:37,276 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:17:54,132 - root - [INFO] - 	!!!Scores: {'accuracy': 0.505, 'average': 0.505}
2024-05-01 18:17:54,132 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 18:17:54,132 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_10/evaluation_runs.json
2024-05-01 18:17:54,132 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:17:54,140 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:17:56,509 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:18:13,080 - root - [INFO] - 	!!!Scores: {'accuracy': 0.499, 'average': 0.499}
2024-05-01 18:18:13,080 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 18:18:13,080 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_11/evaluation_runs.json
2024-05-01 18:18:13,080 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:18:13,089 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:18:14,891 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:18:31,134 - root - [INFO] - 	!!!Scores: {'accuracy': 0.524, 'average': 0.524}
2024-05-01 18:18:31,135 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 18:18:31,135 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_12/evaluation_runs.json
2024-05-01 18:18:31,135 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:18:31,143 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:18:33,499 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:18:50,034 - root - [INFO] - 	!!!Scores: {'accuracy': 0.503, 'average': 0.503}
2024-05-01 18:18:50,034 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 18:18:50,034 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_13/evaluation_runs.json
2024-05-01 18:18:50,034 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:18:50,042 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:18:52,388 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:19:09,758 - root - [INFO] - 	!!!Scores: {'accuracy': 0.501, 'average': 0.501}
2024-05-01 18:19:09,758 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 18:19:09,758 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_14/evaluation_runs.json
2024-05-01 18:19:09,758 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:19:09,766 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 18:19:11,604 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 18:19:27,852 - root - [INFO] - 	!!!Scores: {'accuracy': 0.516, 'average': 0.516}
2024-05-01 18:19:27,853 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 18:19:27,853 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_0/evaluation_runs.json
2024-05-01 18:19:27,853 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:19:28,792 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-5854c73e8e2a58bf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 18:19:28,851 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 18:19:31,010 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:0	Num Templates:15	Num Examples with Template:1200
2024-05-01 18:19:56,290 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 18:19:56,290 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 18:19:56,290 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_1/evaluation_runs.json
2024-05-01 18:19:56,290 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:19:56,299 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 18:19:58,505 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:1	Num Templates:15	Num Examples with Template:1200
2024-05-01 18:20:21,358 - root - [INFO] - 	!!!Scores: {'accuracy': 0.504, 'average': 0.504}
2024-05-01 18:20:21,358 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 18:20:21,358 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_2/evaluation_runs.json
2024-05-01 18:20:21,358 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:20:21,366 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 18:20:23,541 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:2	Num Templates:15	Num Examples with Template:1200
2024-05-01 18:20:46,015 - root - [INFO] - 	!!!Scores: {'accuracy': 0.484, 'average': 0.484}
2024-05-01 18:20:46,015 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 18:20:46,015 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_3/evaluation_runs.json
2024-05-01 18:20:46,015 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:20:46,023 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 18:20:48,179 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:3	Num Templates:15	Num Examples with Template:1200
2024-05-01 18:21:10,848 - root - [INFO] - 	!!!Scores: {'accuracy': 0.507, 'average': 0.507}
2024-05-01 18:21:10,848 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 18:21:10,848 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_4/evaluation_runs.json
2024-05-01 18:21:10,848 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:21:10,857 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 18:21:13,006 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:4	Num Templates:15	Num Examples with Template:1200
2024-05-01 18:21:36,239 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 18:21:36,239 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 18:21:36,239 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_5/evaluation_runs.json
2024-05-01 18:21:36,240 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:21:36,247 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 18:21:38,420 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:5	Num Templates:15	Num Examples with Template:1200
2024-05-01 18:22:01,164 - root - [INFO] - 	!!!Scores: {'accuracy': 0.505, 'average': 0.505}
2024-05-01 18:22:01,164 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 18:22:01,164 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_6/evaluation_runs.json
2024-05-01 18:22:01,164 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:22:01,176 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 18:22:04,019 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:6	Num Templates:15	Num Examples with Template:1200
2024-05-01 18:22:28,941 - root - [INFO] - 	!!!Scores: {'accuracy': 0.495, 'average': 0.495}
2024-05-01 18:22:28,941 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 18:22:28,941 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_7/evaluation_runs.json
2024-05-01 18:22:28,941 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:22:28,949 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 18:22:31,154 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:7	Num Templates:15	Num Examples with Template:1200
2024-05-01 18:22:54,340 - root - [INFO] - 	!!!Scores: {'accuracy': 0.484, 'average': 0.484}
2024-05-01 18:22:54,341 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 18:22:54,341 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_8/evaluation_runs.json
2024-05-01 18:22:54,341 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:22:54,349 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 18:22:56,562 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:8	Num Templates:15	Num Examples with Template:1200
2024-05-01 18:23:19,421 - root - [INFO] - 	!!!Scores: {'accuracy': 0.484, 'average': 0.484}
2024-05-01 18:23:19,421 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 18:23:19,421 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_9/evaluation_runs.json
2024-05-01 18:23:19,421 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:23:19,429 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 18:23:22,282 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:9	Num Templates:15	Num Examples with Template:1200
2024-05-01 18:23:45,996 - root - [INFO] - 	!!!Scores: {'accuracy': 0.491, 'average': 0.491}
2024-05-01 18:23:45,996 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 18:23:45,996 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_10/evaluation_runs.json
2024-05-01 18:23:45,996 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:23:46,004 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 18:23:48,839 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:10	Num Templates:15	Num Examples with Template:1200
2024-05-01 18:24:12,195 - root - [INFO] - 	!!!Scores: {'accuracy': 0.499, 'average': 0.499}
2024-05-01 18:24:12,196 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 18:24:12,196 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_11/evaluation_runs.json
2024-05-01 18:24:12,196 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:24:12,204 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 18:24:14,359 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:11	Num Templates:15	Num Examples with Template:1200
2024-05-01 18:24:37,206 - root - [INFO] - 	!!!Scores: {'accuracy': 0.498, 'average': 0.498}
2024-05-01 18:24:37,207 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 18:24:37,207 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_12/evaluation_runs.json
2024-05-01 18:24:37,207 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:24:37,215 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 18:24:40,036 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:12	Num Templates:15	Num Examples with Template:1200
2024-05-01 18:25:03,395 - root - [INFO] - 	!!!Scores: {'accuracy': 0.51, 'average': 0.51}
2024-05-01 18:25:03,395 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 18:25:03,395 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_13/evaluation_runs.json
2024-05-01 18:25:03,396 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:25:03,404 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 18:25:06,212 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:13	Num Templates:15	Num Examples with Template:1200
2024-05-01 18:25:30,604 - root - [INFO] - 	!!!Scores: {'accuracy': 0.511, 'average': 0.511}
2024-05-01 18:25:30,604 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 18:25:30,604 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_14/evaluation_runs.json
2024-05-01 18:25:30,604 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:25:30,612 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 18:25:32,814 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:14	Num Templates:15	Num Examples with Template:1200
2024-05-01 18:25:55,659 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 18:25:55,735 - root - [INFO] - Unexpected keys: []
2024-05-01 18:25:55,965 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 18:25:55,965 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_0/evaluation_runs.json
2024-05-01 18:25:55,965 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:25:55,973 - root - [INFO] - 		Loading Full Data for rte
2024-05-01 18:25:56,662 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-667c257dd17a5fbb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 18:25:56,685 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 18:25:57,200 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:0	Num Templates:10	Num Examples with Template:245
2024-05-01 18:26:02,957 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 18:26:02,957 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 18:26:02,957 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_1/evaluation_runs.json
2024-05-01 18:26:02,957 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:26:02,965 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 18:26:03,481 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:1	Num Templates:10	Num Examples with Template:245
2024-05-01 18:26:08,928 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 18:26:08,928 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 18:26:08,928 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_2/evaluation_runs.json
2024-05-01 18:26:08,928 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:26:08,936 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 18:26:09,452 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:2	Num Templates:10	Num Examples with Template:245
2024-05-01 18:26:14,898 - root - [INFO] - 	!!!Scores: {'accuracy': 0.804, 'average': 0.804}
2024-05-01 18:26:14,898 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 18:26:14,898 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_3/evaluation_runs.json
2024-05-01 18:26:14,898 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:26:14,906 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 18:26:15,419 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:3	Num Templates:10	Num Examples with Template:245
2024-05-01 18:26:20,793 - root - [INFO] - 	!!!Scores: {'accuracy': 0.78, 'average': 0.78}
2024-05-01 18:26:20,793 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 18:26:20,793 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_4/evaluation_runs.json
2024-05-01 18:26:20,793 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:26:20,801 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 18:26:21,324 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:4	Num Templates:10	Num Examples with Template:245
2024-05-01 18:26:26,771 - root - [INFO] - 	!!!Scores: {'accuracy': 0.804, 'average': 0.804}
2024-05-01 18:26:26,771 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 18:26:26,771 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_5/evaluation_runs.json
2024-05-01 18:26:26,771 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:26:26,779 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 18:26:27,297 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:5	Num Templates:10	Num Examples with Template:245
2024-05-01 18:26:32,745 - root - [INFO] - 	!!!Scores: {'accuracy': 0.784, 'average': 0.784}
2024-05-01 18:26:32,745 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 18:26:32,745 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_6/evaluation_runs.json
2024-05-01 18:26:32,745 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:26:32,753 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 18:26:33,270 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:6	Num Templates:10	Num Examples with Template:245
2024-05-01 18:26:38,651 - root - [INFO] - 	!!!Scores: {'accuracy': 0.776, 'average': 0.776}
2024-05-01 18:26:38,651 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 18:26:38,651 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_7/evaluation_runs.json
2024-05-01 18:26:38,651 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:26:38,659 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 18:26:39,172 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:7	Num Templates:10	Num Examples with Template:245
2024-05-01 18:26:44,755 - root - [INFO] - 	!!!Scores: {'accuracy': 0.776, 'average': 0.776}
2024-05-01 18:26:44,755 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 18:26:44,755 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_8/evaluation_runs.json
2024-05-01 18:26:44,755 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:26:44,763 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 18:26:45,276 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:8	Num Templates:10	Num Examples with Template:245
2024-05-01 18:26:50,717 - root - [INFO] - 	!!!Scores: {'accuracy': 0.784, 'average': 0.784}
2024-05-01 18:26:50,717 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 18:26:50,717 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_9/evaluation_runs.json
2024-05-01 18:26:50,718 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:26:50,725 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 18:26:51,245 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:9	Num Templates:10	Num Examples with Template:245
2024-05-01 18:26:56,781 - root - [INFO] - 	!!!Scores: {'accuracy': 0.788, 'average': 0.788}
2024-05-01 18:26:56,781 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 18:26:56,782 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_0/evaluation_runs.json
2024-05-01 18:26:56,782 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:26:56,789 - root - [INFO] - 		Loading Full Data for cb
2024-05-01 18:26:57,463 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-c26e73467e3f215e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 18:26:57,474 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 18:26:57,538 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:0	Num Templates:15	Num Examples with Template:24
2024-05-01 18:26:58,984 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 18:26:58,984 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 18:26:58,984 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_1/evaluation_runs.json
2024-05-01 18:26:58,984 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:26:58,992 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 18:26:59,042 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:1	Num Templates:15	Num Examples with Template:24
2024-05-01 18:27:00,488 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 18:27:00,489 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 18:27:00,489 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_2/evaluation_runs.json
2024-05-01 18:27:00,489 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:27:00,496 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 18:27:00,560 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:2	Num Templates:15	Num Examples with Template:24
2024-05-01 18:27:02,028 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 18:27:02,028 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 18:27:02,028 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_3/evaluation_runs.json
2024-05-01 18:27:02,028 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:27:02,036 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 18:27:02,087 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:3	Num Templates:15	Num Examples with Template:24
2024-05-01 18:27:03,523 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 18:27:03,523 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 18:27:03,523 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_4/evaluation_runs.json
2024-05-01 18:27:03,523 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:27:03,530 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 18:27:03,581 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:4	Num Templates:15	Num Examples with Template:24
2024-05-01 18:27:05,022 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 18:27:05,022 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 18:27:05,022 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_5/evaluation_runs.json
2024-05-01 18:27:05,022 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:27:05,030 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 18:27:05,094 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:5	Num Templates:15	Num Examples with Template:24
2024-05-01 18:27:06,543 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 18:27:06,543 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 18:27:06,543 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_6/evaluation_runs.json
2024-05-01 18:27:06,543 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:27:06,550 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 18:27:06,601 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:6	Num Templates:15	Num Examples with Template:24
2024-05-01 18:27:08,040 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 18:27:08,040 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 18:27:08,040 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_7/evaluation_runs.json
2024-05-01 18:27:08,040 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:27:08,047 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 18:27:08,111 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:7	Num Templates:15	Num Examples with Template:24
2024-05-01 18:27:09,563 - root - [INFO] - 	!!!Scores: {'accuracy': 0.75, 'average': 0.75}
2024-05-01 18:27:09,563 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 18:27:09,563 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_8/evaluation_runs.json
2024-05-01 18:27:09,563 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:27:09,570 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 18:27:09,621 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:8	Num Templates:15	Num Examples with Template:24
2024-05-01 18:27:11,065 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 18:27:11,065 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 18:27:11,065 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_9/evaluation_runs.json
2024-05-01 18:27:11,065 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:27:11,073 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 18:27:11,124 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:9	Num Templates:15	Num Examples with Template:24
2024-05-01 18:27:12,572 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 18:27:12,572 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 18:27:12,572 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_10/evaluation_runs.json
2024-05-01 18:27:12,572 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:27:12,580 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 18:27:12,644 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:10	Num Templates:15	Num Examples with Template:24
2024-05-01 18:27:14,099 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 18:27:14,099 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 18:27:14,099 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_11/evaluation_runs.json
2024-05-01 18:27:14,099 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:27:14,107 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 18:27:14,157 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:11	Num Templates:15	Num Examples with Template:24
2024-05-01 18:27:15,599 - root - [INFO] - 	!!!Scores: {'accuracy': 0.75, 'average': 0.75}
2024-05-01 18:27:15,600 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 18:27:15,600 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_12/evaluation_runs.json
2024-05-01 18:27:15,600 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:27:15,607 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 18:27:15,658 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:12	Num Templates:15	Num Examples with Template:24
2024-05-01 18:27:17,147 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 18:27:17,147 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 18:27:17,147 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_13/evaluation_runs.json
2024-05-01 18:27:17,147 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:27:17,154 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 18:27:17,206 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:13	Num Templates:15	Num Examples with Template:24
2024-05-01 18:27:18,650 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 18:27:18,651 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 18:27:18,651 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_14/evaluation_runs.json
2024-05-01 18:27:18,651 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:27:18,658 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 18:27:18,723 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:14	Num Templates:15	Num Examples with Template:24
2024-05-01 18:27:20,205 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 18:27:20,205 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 18:27:20,205 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_0/evaluation_runs.json
2024-05-01 18:27:20,205 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:27:20,903 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-3ae7a9f52719d86a/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 18:27:20,958 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 18:27:24,566 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:0	Num Templates:5	Num Examples with Template:1235
2024-05-01 18:27:34,211 - root - [INFO] - 	!!!Scores: {'accuracy': 0.68, 'average': 0.68}
2024-05-01 18:27:34,211 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 18:27:34,211 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_1/evaluation_runs.json
2024-05-01 18:27:34,212 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:27:34,219 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 18:27:37,760 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:1	Num Templates:5	Num Examples with Template:1235
2024-05-01 18:27:47,485 - root - [INFO] - 	!!!Scores: {'accuracy': 0.665, 'average': 0.665}
2024-05-01 18:27:47,485 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 18:27:47,485 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_2/evaluation_runs.json
2024-05-01 18:27:47,485 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:27:47,493 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 18:27:51,100 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:2	Num Templates:5	Num Examples with Template:1235
2024-05-01 18:28:00,872 - root - [INFO] - 	!!!Scores: {'accuracy': 0.674, 'average': 0.674}
2024-05-01 18:28:00,873 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 18:28:00,873 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_3/evaluation_runs.json
2024-05-01 18:28:00,873 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:28:00,881 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 18:28:04,529 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:3	Num Templates:5	Num Examples with Template:1235
2024-05-01 18:28:14,640 - root - [INFO] - 	!!!Scores: {'accuracy': 0.665, 'average': 0.665}
2024-05-01 18:28:14,640 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 18:28:14,640 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_4/evaluation_runs.json
2024-05-01 18:28:14,640 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:28:14,648 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 18:28:18,256 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:4	Num Templates:5	Num Examples with Template:1235
2024-05-01 18:28:28,233 - root - [INFO] - 	!!!Scores: {'accuracy': 0.669, 'average': 0.669}
2024-05-01 18:28:28,233 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 18:28:28,233 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_0/evaluation_runs.json
2024-05-01 18:28:28,233 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:28:28,242 - root - [INFO] - 		Loading Full Data for wic
2024-05-01 18:28:28,957 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-8f888d9273347dcb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 18:28:29,010 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 18:28:30,500 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:0	Num Templates:10	Num Examples with Template:606
2024-05-01 18:28:35,827 - root - [INFO] - 	!!!Scores: {'accuracy': 0.668, 'average': 0.668}
2024-05-01 18:28:35,827 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 18:28:35,827 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_1/evaluation_runs.json
2024-05-01 18:28:35,827 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:28:35,835 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 18:28:37,327 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:1	Num Templates:10	Num Examples with Template:606
2024-05-01 18:28:42,524 - root - [INFO] - 	!!!Scores: {'accuracy': 0.667, 'average': 0.667}
2024-05-01 18:28:42,524 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 18:28:42,524 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_2/evaluation_runs.json
2024-05-01 18:28:42,524 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:28:42,532 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 18:28:44,042 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:2	Num Templates:10	Num Examples with Template:606
2024-05-01 18:28:49,743 - root - [INFO] - 	!!!Scores: {'accuracy': 0.652, 'average': 0.652}
2024-05-01 18:28:49,743 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 18:28:49,743 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_3/evaluation_runs.json
2024-05-01 18:28:49,743 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:28:49,751 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 18:28:51,242 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:3	Num Templates:10	Num Examples with Template:606
2024-05-01 18:28:57,065 - root - [INFO] - 	!!!Scores: {'accuracy': 0.558, 'average': 0.558}
2024-05-01 18:28:57,065 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 18:28:57,065 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_4/evaluation_runs.json
2024-05-01 18:28:57,065 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:28:57,073 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 18:28:58,554 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:4	Num Templates:10	Num Examples with Template:606
2024-05-01 18:29:03,962 - root - [INFO] - 	!!!Scores: {'accuracy': 0.563, 'average': 0.563}
2024-05-01 18:29:03,962 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 18:29:03,962 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_5/evaluation_runs.json
2024-05-01 18:29:03,962 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:29:03,970 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 18:29:05,466 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:5	Num Templates:10	Num Examples with Template:606
2024-05-01 18:29:11,302 - root - [INFO] - 	!!!Scores: {'accuracy': 0.673, 'average': 0.673}
2024-05-01 18:29:11,303 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 18:29:11,303 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_6/evaluation_runs.json
2024-05-01 18:29:11,303 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:29:11,310 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 18:29:12,801 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:6	Num Templates:10	Num Examples with Template:606
2024-05-01 18:29:18,215 - root - [INFO] - 	!!!Scores: {'accuracy': 0.673, 'average': 0.673}
2024-05-01 18:29:18,215 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 18:29:18,215 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_7/evaluation_runs.json
2024-05-01 18:29:18,215 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:29:18,223 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 18:29:19,700 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:7	Num Templates:10	Num Examples with Template:606
2024-05-01 18:29:25,268 - root - [INFO] - 	!!!Scores: {'accuracy': 0.538, 'average': 0.538}
2024-05-01 18:29:25,268 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 18:29:25,268 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_8/evaluation_runs.json
2024-05-01 18:29:25,268 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:29:25,276 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 18:29:26,772 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:8	Num Templates:10	Num Examples with Template:606
2024-05-01 18:29:32,850 - root - [INFO] - 	!!!Scores: {'accuracy': 0.645, 'average': 0.645}
2024-05-01 18:29:32,850 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 18:29:32,850 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_9/evaluation_runs.json
2024-05-01 18:29:32,850 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:29:32,858 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 18:29:34,335 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:9	Num Templates:10	Num Examples with Template:606
2024-05-01 18:29:39,069 - root - [INFO] - 	!!!Scores: {'accuracy': 0.657, 'average': 0.657}
2024-05-01 18:29:39,069 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 18:29:39,069 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_0/evaluation_runs.json
2024-05-01 18:29:39,069 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:29:39,076 - root - [INFO] - 		Loading Full Data for wsc
2024-05-01 18:29:39,762 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-4bcfdb1f33f6e278/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 18:29:39,780 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 18:29:39,969 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:0	Num Templates:10	Num Examples with Template:72
2024-05-01 18:29:41,760 - root - [INFO] - 	!!!Scores: {'accuracy': 0.611, 'average': 0.611}
2024-05-01 18:29:41,760 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 18:29:41,760 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_1/evaluation_runs.json
2024-05-01 18:29:41,760 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:29:41,768 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 18:29:41,940 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:1	Num Templates:10	Num Examples with Template:72
2024-05-01 18:29:43,702 - root - [INFO] - 	!!!Scores: {'accuracy': 0.556, 'average': 0.556}
2024-05-01 18:29:43,703 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 18:29:43,703 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_2/evaluation_runs.json
2024-05-01 18:29:43,703 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:29:43,710 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 18:29:43,927 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:2	Num Templates:10	Num Examples with Template:72
2024-05-01 18:29:45,852 - root - [INFO] - 	!!!Scores: {'accuracy': 0.667, 'average': 0.667}
2024-05-01 18:29:45,852 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 18:29:45,853 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_3/evaluation_runs.json
2024-05-01 18:29:45,853 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:29:45,860 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 18:29:46,076 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:3	Num Templates:10	Num Examples with Template:72
2024-05-01 18:29:48,006 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 18:29:48,006 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 18:29:48,006 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_4/evaluation_runs.json
2024-05-01 18:29:48,006 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:29:48,014 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 18:29:48,315 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:4	Num Templates:10	Num Examples with Template:72
2024-05-01 18:29:50,067 - root - [INFO] - 	!!!Scores: {'accuracy': 0.542, 'average': 0.542}
2024-05-01 18:29:50,067 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 18:29:50,067 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_5/evaluation_runs.json
2024-05-01 18:29:50,067 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:29:50,075 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 18:29:50,252 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:5	Num Templates:10	Num Examples with Template:72
2024-05-01 18:29:52,052 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 18:29:52,052 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 18:29:52,052 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_6/evaluation_runs.json
2024-05-01 18:29:52,053 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:29:52,060 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 18:29:52,235 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:6	Num Templates:10	Num Examples with Template:72
2024-05-01 18:29:54,044 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 18:29:54,044 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 18:29:54,044 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_7/evaluation_runs.json
2024-05-01 18:29:54,044 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:29:54,052 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 18:29:54,327 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:7	Num Templates:10	Num Examples with Template:72
2024-05-01 18:29:56,095 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 18:29:56,095 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 18:29:56,095 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_8/evaluation_runs.json
2024-05-01 18:29:56,095 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:29:56,103 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 18:29:56,279 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:8	Num Templates:10	Num Examples with Template:72
2024-05-01 18:29:58,075 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 18:29:58,075 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 18:29:58,075 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_9/evaluation_runs.json
2024-05-01 18:29:58,075 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:29:58,083 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 18:29:58,374 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:9	Num Templates:10	Num Examples with Template:72
2024-05-01 18:30:00,134 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 18:30:00,134 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 18:30:00,135 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_0/evaluation_runs.json
2024-05-01 18:30:00,135 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:30:00,142 - root - [INFO] - 		Loading Full Data for copa
2024-05-01 18:30:00,837 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-75ee279101665e7b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 18:30:00,851 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 18:30:01,070 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:0	Num Templates:8	Num Examples with Template:68
2024-05-01 18:30:02,655 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 18:30:02,656 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 18:30:02,656 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_1/evaluation_runs.json
2024-05-01 18:30:02,656 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:30:02,663 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 18:30:02,872 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:1	Num Templates:8	Num Examples with Template:68
2024-05-01 18:30:04,490 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 18:30:04,490 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 18:30:04,490 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_2/evaluation_runs.json
2024-05-01 18:30:04,491 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:30:04,498 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 18:30:04,708 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:2	Num Templates:8	Num Examples with Template:68
2024-05-01 18:30:06,302 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 18:30:06,303 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 18:30:06,303 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_3/evaluation_runs.json
2024-05-01 18:30:06,303 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:30:06,310 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 18:30:06,534 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:3	Num Templates:8	Num Examples with Template:68
2024-05-01 18:30:08,065 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 18:30:08,065 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 18:30:08,066 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_4/evaluation_runs.json
2024-05-01 18:30:08,066 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:30:08,073 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 18:30:08,282 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:4	Num Templates:8	Num Examples with Template:68
2024-05-01 18:30:09,882 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 18:30:09,882 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 18:30:09,883 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_5/evaluation_runs.json
2024-05-01 18:30:09,883 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:30:09,890 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 18:30:10,101 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:5	Num Templates:8	Num Examples with Template:68
2024-05-01 18:30:11,707 - root - [INFO] - 	!!!Scores: {'accuracy': 0.868, 'average': 0.868}
2024-05-01 18:30:11,707 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 18:30:11,708 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_6/evaluation_runs.json
2024-05-01 18:30:11,708 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:30:11,715 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 18:30:11,925 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:6	Num Templates:8	Num Examples with Template:68
2024-05-01 18:30:13,487 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 18:30:13,487 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 18:30:13,488 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_7/evaluation_runs.json
2024-05-01 18:30:13,488 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:30:13,495 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 18:30:13,704 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:7	Num Templates:8	Num Examples with Template:68
2024-05-01 18:30:15,258 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 18:30:15,258 - root - [INFO] - 	Evaluating model on h-swag dataset
2024-05-01 18:30:15,258 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/h-swag_template_0/evaluation_runs.json
2024-05-01 18:30:15,258 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 18:30:15,272 - root - [INFO] - 		Loading Full Data for h-swag
2024-05-01 18:30:15,954 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-eac19eb96b26d7b6/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 18:30:16,643 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Selected Examples: 10010	Num Total Example:10010
2024-05-01 18:30:38,446 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Num Selected Example with Templates:10010	Template Idx:0	Num Templates:1	Num Examples with Template:10010
2024-05-01 19:14:07,336 - root - [INFO] - 	!!!Scores: {'accuracy': 0.424, 'average': 0.424}
2024-05-01 19:14:07,336 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 19:14:07,337 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_0/evaluation_runs.json
2024-05-01 19:14:07,337 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:14:08,051 - datasets.builder - [WARNING] - Found cached dataset arrow (/home/guodong/.cache/huggingface/datasets/arrow/default-2796e0ea53bfdf30/0.0.0/74f69db2c14c2860059d39860b1f400a03d11bf7fb5a8258ca38c501c878c137)
2024-05-01 19:14:08,163 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 19:14:14,119 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:0	Num Templates:5	Num Examples with Template:1839
2024-05-01 19:14:38,423 - root - [INFO] - 	!!!Scores: {'accuracy': 0.916, 'average': 0.916}
2024-05-01 19:14:38,423 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 19:14:38,423 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_1/evaluation_runs.json
2024-05-01 19:14:38,424 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:14:38,432 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 19:14:44,458 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:1	Num Templates:5	Num Examples with Template:1839
2024-05-01 19:15:09,183 - root - [INFO] - 	!!!Scores: {'accuracy': 0.916, 'average': 0.916}
2024-05-01 19:15:09,183 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 19:15:09,183 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_2/evaluation_runs.json
2024-05-01 19:15:09,183 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:15:09,192 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 19:15:15,174 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:2	Num Templates:5	Num Examples with Template:1839
2024-05-01 19:15:39,743 - root - [INFO] - 	!!!Scores: {'accuracy': 0.923, 'average': 0.923}
2024-05-01 19:15:39,743 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 19:15:39,743 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_3/evaluation_runs.json
2024-05-01 19:15:39,743 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:15:39,751 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 19:15:45,775 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:3	Num Templates:5	Num Examples with Template:1839
2024-05-01 19:16:10,355 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 19:16:10,356 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 19:16:10,356 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_4/evaluation_runs.json
2024-05-01 19:16:10,356 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:16:10,364 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 19:16:16,368 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:4	Num Templates:5	Num Examples with Template:1839
2024-05-01 19:16:41,635 - root - [INFO] - 	!!!Scores: {'accuracy': 0.914, 'average': 0.914}
2024-05-01 19:16:41,635 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:16:41,635 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_0/evaluation_runs.json
2024-05-01 19:16:41,635 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:16:42,542 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-9d580cad42f0d988/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 19:16:42,595 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:16:44,567 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:17:03,032 - root - [INFO] - 	!!!Scores: {'accuracy': 0.635, 'average': 0.635}
2024-05-01 19:17:03,032 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:17:03,032 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_1/evaluation_runs.json
2024-05-01 19:17:03,033 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:17:03,041 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:17:04,892 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:17:21,417 - root - [INFO] - 	!!!Scores: {'accuracy': 0.67, 'average': 0.67}
2024-05-01 19:17:21,417 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:17:21,417 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_2/evaluation_runs.json
2024-05-01 19:17:21,418 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:17:21,425 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:17:23,250 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:17:39,492 - root - [INFO] - 	!!!Scores: {'accuracy': 0.672, 'average': 0.672}
2024-05-01 19:17:39,492 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:17:39,492 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_3/evaluation_runs.json
2024-05-01 19:17:39,492 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:17:39,500 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:17:41,308 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:17:57,727 - root - [INFO] - 	!!!Scores: {'accuracy': 0.664, 'average': 0.664}
2024-05-01 19:17:57,727 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:17:57,728 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_4/evaluation_runs.json
2024-05-01 19:17:57,728 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:17:57,736 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:17:59,541 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:18:16,308 - root - [INFO] - 	!!!Scores: {'accuracy': 0.677, 'average': 0.677}
2024-05-01 19:18:16,308 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:18:16,308 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_5/evaluation_runs.json
2024-05-01 19:18:16,308 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:18:16,317 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:18:18,141 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:18:34,592 - root - [INFO] - 	!!!Scores: {'accuracy': 0.668, 'average': 0.668}
2024-05-01 19:18:34,592 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:18:34,592 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_6/evaluation_runs.json
2024-05-01 19:18:34,592 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:18:34,600 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:18:36,977 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:18:55,093 - root - [INFO] - 	!!!Scores: {'accuracy': 0.659, 'average': 0.659}
2024-05-01 19:18:55,093 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:18:55,093 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_7/evaluation_runs.json
2024-05-01 19:18:55,093 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:18:55,101 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:18:56,947 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:19:13,637 - root - [INFO] - 	!!!Scores: {'accuracy': 0.671, 'average': 0.671}
2024-05-01 19:19:13,637 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:19:13,638 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_8/evaluation_runs.json
2024-05-01 19:19:13,638 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:19:13,646 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:19:15,487 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:19:31,994 - root - [INFO] - 	!!!Scores: {'accuracy': 0.673, 'average': 0.673}
2024-05-01 19:19:31,994 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:19:31,994 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_9/evaluation_runs.json
2024-05-01 19:19:31,994 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:19:32,002 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:19:34,386 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:19:51,535 - root - [INFO] - 	!!!Scores: {'accuracy': 0.637, 'average': 0.637}
2024-05-01 19:19:51,535 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:19:51,535 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_10/evaluation_runs.json
2024-05-01 19:19:51,535 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:19:51,544 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:19:53,925 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:20:10,761 - root - [INFO] - 	!!!Scores: {'accuracy': 0.654, 'average': 0.654}
2024-05-01 19:20:10,761 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:20:10,761 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_11/evaluation_runs.json
2024-05-01 19:20:10,761 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:20:10,769 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:20:12,573 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:20:29,093 - root - [INFO] - 	!!!Scores: {'accuracy': 0.667, 'average': 0.667}
2024-05-01 19:20:29,093 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:20:29,093 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_12/evaluation_runs.json
2024-05-01 19:20:29,093 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:20:29,101 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:20:31,461 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:20:48,327 - root - [INFO] - 	!!!Scores: {'accuracy': 0.629, 'average': 0.629}
2024-05-01 19:20:48,327 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:20:48,327 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_13/evaluation_runs.json
2024-05-01 19:20:48,327 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:20:48,335 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:20:50,687 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:21:08,418 - root - [INFO] - 	!!!Scores: {'accuracy': 0.654, 'average': 0.654}
2024-05-01 19:21:08,418 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:21:08,418 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_14/evaluation_runs.json
2024-05-01 19:21:08,418 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:21:08,426 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:21:10,269 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:21:26,794 - root - [INFO] - 	!!!Scores: {'accuracy': 0.674, 'average': 0.674}
2024-05-01 19:21:26,795 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:21:26,795 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_0/evaluation_runs.json
2024-05-01 19:21:26,795 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:21:27,717 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-2f5e309763d7971f/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 19:21:27,768 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:21:29,573 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:21:47,497 - root - [INFO] - 	!!!Scores: {'accuracy': 0.498, 'average': 0.498}
2024-05-01 19:21:47,497 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:21:47,498 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_1/evaluation_runs.json
2024-05-01 19:21:47,498 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:21:47,506 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:21:49,346 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:22:05,582 - root - [INFO] - 	!!!Scores: {'accuracy': 0.52, 'average': 0.52}
2024-05-01 19:22:05,582 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:22:05,582 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_2/evaluation_runs.json
2024-05-01 19:22:05,582 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:22:05,590 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:22:07,405 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:22:23,256 - root - [INFO] - 	!!!Scores: {'accuracy': 0.516, 'average': 0.516}
2024-05-01 19:22:23,256 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:22:23,256 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_3/evaluation_runs.json
2024-05-01 19:22:23,257 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:22:23,264 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:22:25,064 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:22:41,104 - root - [INFO] - 	!!!Scores: {'accuracy': 0.501, 'average': 0.501}
2024-05-01 19:22:41,105 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:22:41,105 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_4/evaluation_runs.json
2024-05-01 19:22:41,105 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:22:41,112 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:22:42,910 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:22:59,364 - root - [INFO] - 	!!!Scores: {'accuracy': 0.513, 'average': 0.513}
2024-05-01 19:22:59,364 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:22:59,365 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_5/evaluation_runs.json
2024-05-01 19:22:59,365 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:22:59,372 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:23:01,188 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:23:17,293 - root - [INFO] - 	!!!Scores: {'accuracy': 0.523, 'average': 0.523}
2024-05-01 19:23:17,294 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:23:17,294 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_6/evaluation_runs.json
2024-05-01 19:23:17,294 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:23:17,302 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:23:19,674 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:23:37,320 - root - [INFO] - 	!!!Scores: {'accuracy': 0.515, 'average': 0.515}
2024-05-01 19:23:37,320 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:23:37,321 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_7/evaluation_runs.json
2024-05-01 19:23:37,321 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:23:37,329 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:23:39,172 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:23:55,566 - root - [INFO] - 	!!!Scores: {'accuracy': 0.52, 'average': 0.52}
2024-05-01 19:23:55,566 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:23:55,566 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_8/evaluation_runs.json
2024-05-01 19:23:55,566 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:23:55,574 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:23:57,413 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:24:13,648 - root - [INFO] - 	!!!Scores: {'accuracy': 0.515, 'average': 0.515}
2024-05-01 19:24:13,648 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:24:13,648 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_9/evaluation_runs.json
2024-05-01 19:24:13,649 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:24:13,656 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:24:16,033 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:24:32,889 - root - [INFO] - 	!!!Scores: {'accuracy': 0.504, 'average': 0.504}
2024-05-01 19:24:32,890 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:24:32,890 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_10/evaluation_runs.json
2024-05-01 19:24:32,890 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:24:32,897 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:24:35,263 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:24:51,830 - root - [INFO] - 	!!!Scores: {'accuracy': 0.495, 'average': 0.495}
2024-05-01 19:24:51,830 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:24:51,831 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_11/evaluation_runs.json
2024-05-01 19:24:51,831 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:24:51,838 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:24:53,636 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:25:09,883 - root - [INFO] - 	!!!Scores: {'accuracy': 0.524, 'average': 0.524}
2024-05-01 19:25:09,883 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:25:09,883 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_12/evaluation_runs.json
2024-05-01 19:25:09,883 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:25:09,891 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:25:12,245 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:25:28,790 - root - [INFO] - 	!!!Scores: {'accuracy': 0.498, 'average': 0.498}
2024-05-01 19:25:28,790 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:25:28,790 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_13/evaluation_runs.json
2024-05-01 19:25:28,790 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:25:28,799 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:25:31,147 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:25:48,524 - root - [INFO] - 	!!!Scores: {'accuracy': 0.503, 'average': 0.503}
2024-05-01 19:25:48,524 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:25:48,524 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_14/evaluation_runs.json
2024-05-01 19:25:48,524 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:25:48,532 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:25:50,370 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:26:06,616 - root - [INFO] - 	!!!Scores: {'accuracy': 0.517, 'average': 0.517}
2024-05-01 19:26:06,616 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 19:26:06,616 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_0/evaluation_runs.json
2024-05-01 19:26:06,616 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:26:07,607 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-5854c73e8e2a58bf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 19:26:07,665 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 19:26:09,826 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:0	Num Templates:15	Num Examples with Template:1200
2024-05-01 19:26:35,116 - root - [INFO] - 	!!!Scores: {'accuracy': 0.49, 'average': 0.49}
2024-05-01 19:26:35,116 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 19:26:35,116 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_1/evaluation_runs.json
2024-05-01 19:26:35,116 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:26:35,124 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 19:26:37,332 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:1	Num Templates:15	Num Examples with Template:1200
2024-05-01 19:27:00,191 - root - [INFO] - 	!!!Scores: {'accuracy': 0.495, 'average': 0.495}
2024-05-01 19:27:00,191 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 19:27:00,191 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_2/evaluation_runs.json
2024-05-01 19:27:00,192 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:27:00,199 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 19:27:02,380 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:2	Num Templates:15	Num Examples with Template:1200
2024-05-01 19:27:24,851 - root - [INFO] - 	!!!Scores: {'accuracy': 0.482, 'average': 0.482}
2024-05-01 19:27:24,852 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 19:27:24,852 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_3/evaluation_runs.json
2024-05-01 19:27:24,852 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:27:24,860 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 19:27:27,022 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:3	Num Templates:15	Num Examples with Template:1200
2024-05-01 19:27:49,703 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 19:27:49,703 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 19:27:49,703 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_4/evaluation_runs.json
2024-05-01 19:27:49,703 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:27:49,712 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 19:27:51,866 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:4	Num Templates:15	Num Examples with Template:1200
2024-05-01 19:28:15,113 - root - [INFO] - 	!!!Scores: {'accuracy': 0.496, 'average': 0.496}
2024-05-01 19:28:15,113 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 19:28:15,113 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_5/evaluation_runs.json
2024-05-01 19:28:15,113 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:28:15,121 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 19:28:17,304 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:5	Num Templates:15	Num Examples with Template:1200
2024-05-01 19:28:40,043 - root - [INFO] - 	!!!Scores: {'accuracy': 0.507, 'average': 0.507}
2024-05-01 19:28:40,044 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 19:28:40,044 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_6/evaluation_runs.json
2024-05-01 19:28:40,044 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:28:40,052 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 19:28:42,899 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:6	Num Templates:15	Num Examples with Template:1200
2024-05-01 19:29:07,763 - root - [INFO] - 	!!!Scores: {'accuracy': 0.492, 'average': 0.492}
2024-05-01 19:29:07,763 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 19:29:07,764 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_7/evaluation_runs.json
2024-05-01 19:29:07,764 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:29:07,771 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 19:29:09,979 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:7	Num Templates:15	Num Examples with Template:1200
2024-05-01 19:29:33,115 - root - [INFO] - 	!!!Scores: {'accuracy': 0.472, 'average': 0.472}
2024-05-01 19:29:33,116 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 19:29:33,116 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_8/evaluation_runs.json
2024-05-01 19:29:33,116 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:29:33,124 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 19:29:35,331 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:8	Num Templates:15	Num Examples with Template:1200
2024-05-01 19:29:58,173 - root - [INFO] - 	!!!Scores: {'accuracy': 0.477, 'average': 0.477}
2024-05-01 19:29:58,173 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 19:29:58,173 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_9/evaluation_runs.json
2024-05-01 19:29:58,173 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:29:58,182 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 19:30:01,257 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:9	Num Templates:15	Num Examples with Template:1200
2024-05-01 19:30:24,984 - root - [INFO] - 	!!!Scores: {'accuracy': 0.494, 'average': 0.494}
2024-05-01 19:30:24,984 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 19:30:24,984 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_10/evaluation_runs.json
2024-05-01 19:30:24,984 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:30:24,992 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 19:30:27,850 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:10	Num Templates:15	Num Examples with Template:1200
2024-05-01 19:30:51,208 - root - [INFO] - 	!!!Scores: {'accuracy': 0.496, 'average': 0.496}
2024-05-01 19:30:51,208 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 19:30:51,208 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_11/evaluation_runs.json
2024-05-01 19:30:51,209 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:30:51,217 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 19:30:53,379 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:11	Num Templates:15	Num Examples with Template:1200
2024-05-01 19:31:16,216 - root - [INFO] - 	!!!Scores: {'accuracy': 0.489, 'average': 0.489}
2024-05-01 19:31:16,217 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 19:31:16,217 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_12/evaluation_runs.json
2024-05-01 19:31:16,217 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:31:16,225 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 19:31:19,056 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:12	Num Templates:15	Num Examples with Template:1200
2024-05-01 19:31:42,418 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 19:31:42,418 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 19:31:42,418 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_13/evaluation_runs.json
2024-05-01 19:31:42,418 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:31:42,427 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 19:31:45,242 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:13	Num Templates:15	Num Examples with Template:1200
2024-05-01 19:32:09,633 - root - [INFO] - 	!!!Scores: {'accuracy': 0.504, 'average': 0.504}
2024-05-01 19:32:09,633 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 19:32:09,633 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_14/evaluation_runs.json
2024-05-01 19:32:09,633 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:32:09,641 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 19:32:11,854 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:14	Num Templates:15	Num Examples with Template:1200
2024-05-01 19:32:34,702 - root - [INFO] - 	!!!Scores: {'accuracy': 0.499, 'average': 0.499}
2024-05-01 19:32:34,784 - root - [INFO] - Unexpected keys: []
2024-05-01 19:32:35,015 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 19:32:35,016 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_0/evaluation_runs.json
2024-05-01 19:32:35,016 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:32:35,025 - root - [INFO] - 		Loading Full Data for rte
2024-05-01 19:32:35,950 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-667c257dd17a5fbb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 19:32:35,973 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 19:32:36,490 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:0	Num Templates:10	Num Examples with Template:245
2024-05-01 19:32:42,243 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 19:32:42,243 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 19:32:42,243 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_1/evaluation_runs.json
2024-05-01 19:32:42,243 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:32:42,251 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 19:32:42,769 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:1	Num Templates:10	Num Examples with Template:245
2024-05-01 19:32:48,215 - root - [INFO] - 	!!!Scores: {'accuracy': 0.796, 'average': 0.796}
2024-05-01 19:32:48,215 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 19:32:48,215 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_2/evaluation_runs.json
2024-05-01 19:32:48,215 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:32:48,223 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 19:32:48,741 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:2	Num Templates:10	Num Examples with Template:245
2024-05-01 19:32:54,186 - root - [INFO] - 	!!!Scores: {'accuracy': 0.812, 'average': 0.812}
2024-05-01 19:32:54,186 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 19:32:54,186 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_3/evaluation_runs.json
2024-05-01 19:32:54,186 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:32:54,194 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 19:32:54,707 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:3	Num Templates:10	Num Examples with Template:245
2024-05-01 19:33:00,080 - root - [INFO] - 	!!!Scores: {'accuracy': 0.784, 'average': 0.784}
2024-05-01 19:33:00,080 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 19:33:00,080 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_4/evaluation_runs.json
2024-05-01 19:33:00,080 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:33:00,088 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 19:33:00,602 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:4	Num Templates:10	Num Examples with Template:245
2024-05-01 19:33:06,049 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 19:33:06,049 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 19:33:06,049 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_5/evaluation_runs.json
2024-05-01 19:33:06,049 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:33:06,056 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 19:33:06,574 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:5	Num Templates:10	Num Examples with Template:245
2024-05-01 19:33:12,020 - root - [INFO] - 	!!!Scores: {'accuracy': 0.796, 'average': 0.796}
2024-05-01 19:33:12,021 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 19:33:12,021 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_6/evaluation_runs.json
2024-05-01 19:33:12,021 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:33:12,028 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 19:33:12,546 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:6	Num Templates:10	Num Examples with Template:245
2024-05-01 19:33:17,931 - root - [INFO] - 	!!!Scores: {'accuracy': 0.776, 'average': 0.776}
2024-05-01 19:33:17,931 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 19:33:17,931 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_7/evaluation_runs.json
2024-05-01 19:33:17,931 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:33:17,939 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 19:33:18,452 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:7	Num Templates:10	Num Examples with Template:245
2024-05-01 19:33:24,031 - root - [INFO] - 	!!!Scores: {'accuracy': 0.776, 'average': 0.776}
2024-05-01 19:33:24,031 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 19:33:24,031 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_8/evaluation_runs.json
2024-05-01 19:33:24,031 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:33:24,039 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 19:33:24,552 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:8	Num Templates:10	Num Examples with Template:245
2024-05-01 19:33:29,992 - root - [INFO] - 	!!!Scores: {'accuracy': 0.788, 'average': 0.788}
2024-05-01 19:33:29,992 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 19:33:29,992 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_9/evaluation_runs.json
2024-05-01 19:33:29,992 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:33:30,000 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 19:33:30,518 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:9	Num Templates:10	Num Examples with Template:245
2024-05-01 19:33:36,057 - root - [INFO] - 	!!!Scores: {'accuracy': 0.804, 'average': 0.804}
2024-05-01 19:33:36,058 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 19:33:36,058 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_0/evaluation_runs.json
2024-05-01 19:33:36,058 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:33:36,065 - root - [INFO] - 		Loading Full Data for cb
2024-05-01 19:33:36,972 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-c26e73467e3f215e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 19:33:36,981 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 19:33:37,045 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:0	Num Templates:15	Num Examples with Template:24
2024-05-01 19:33:38,489 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 19:33:38,489 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 19:33:38,489 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_1/evaluation_runs.json
2024-05-01 19:33:38,489 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:33:38,496 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 19:33:38,547 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:1	Num Templates:15	Num Examples with Template:24
2024-05-01 19:33:39,993 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 19:33:39,993 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 19:33:39,993 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_2/evaluation_runs.json
2024-05-01 19:33:39,994 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:33:40,001 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 19:33:40,064 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:2	Num Templates:15	Num Examples with Template:24
2024-05-01 19:33:41,531 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 19:33:41,532 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 19:33:41,532 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_3/evaluation_runs.json
2024-05-01 19:33:41,532 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:33:41,539 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 19:33:41,590 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:3	Num Templates:15	Num Examples with Template:24
2024-05-01 19:33:43,024 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 19:33:43,024 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 19:33:43,024 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_4/evaluation_runs.json
2024-05-01 19:33:43,024 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:33:43,031 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 19:33:43,081 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:4	Num Templates:15	Num Examples with Template:24
2024-05-01 19:33:44,522 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 19:33:44,522 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 19:33:44,522 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_5/evaluation_runs.json
2024-05-01 19:33:44,522 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:33:44,529 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 19:33:44,593 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:5	Num Templates:15	Num Examples with Template:24
2024-05-01 19:33:46,043 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 19:33:46,043 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 19:33:46,043 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_6/evaluation_runs.json
2024-05-01 19:33:46,044 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:33:46,051 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 19:33:46,101 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:6	Num Templates:15	Num Examples with Template:24
2024-05-01 19:33:47,540 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 19:33:47,540 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 19:33:47,540 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_7/evaluation_runs.json
2024-05-01 19:33:47,540 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:33:47,548 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 19:33:47,611 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:7	Num Templates:15	Num Examples with Template:24
2024-05-01 19:33:49,062 - root - [INFO] - 	!!!Scores: {'accuracy': 0.75, 'average': 0.75}
2024-05-01 19:33:49,062 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 19:33:49,062 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_8/evaluation_runs.json
2024-05-01 19:33:49,062 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:33:49,069 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 19:33:49,120 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:8	Num Templates:15	Num Examples with Template:24
2024-05-01 19:33:50,561 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 19:33:50,562 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 19:33:50,562 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_9/evaluation_runs.json
2024-05-01 19:33:50,562 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:33:50,569 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 19:33:50,620 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:9	Num Templates:15	Num Examples with Template:24
2024-05-01 19:33:52,064 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 19:33:52,065 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 19:33:52,065 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_10/evaluation_runs.json
2024-05-01 19:33:52,065 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:33:52,072 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 19:33:52,136 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:10	Num Templates:15	Num Examples with Template:24
2024-05-01 19:33:53,591 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 19:33:53,592 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 19:33:53,592 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_11/evaluation_runs.json
2024-05-01 19:33:53,592 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:33:53,599 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 19:33:53,649 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:11	Num Templates:15	Num Examples with Template:24
2024-05-01 19:33:55,092 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 19:33:55,092 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 19:33:55,092 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_12/evaluation_runs.json
2024-05-01 19:33:55,092 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:33:55,099 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 19:33:55,150 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:12	Num Templates:15	Num Examples with Template:24
2024-05-01 19:33:56,635 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 19:33:56,636 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 19:33:56,636 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_13/evaluation_runs.json
2024-05-01 19:33:56,636 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:33:56,643 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 19:33:56,694 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:13	Num Templates:15	Num Examples with Template:24
2024-05-01 19:33:58,135 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 19:33:58,135 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 19:33:58,135 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_14/evaluation_runs.json
2024-05-01 19:33:58,135 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:33:58,143 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 19:33:58,207 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:14	Num Templates:15	Num Examples with Template:24
2024-05-01 19:33:59,685 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 19:33:59,686 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 19:33:59,686 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_0/evaluation_runs.json
2024-05-01 19:33:59,686 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:34:00,607 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-3ae7a9f52719d86a/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 19:34:00,662 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 19:34:04,262 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:0	Num Templates:5	Num Examples with Template:1235
2024-05-01 19:34:13,890 - root - [INFO] - 	!!!Scores: {'accuracy': 0.679, 'average': 0.679}
2024-05-01 19:34:13,890 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 19:34:13,890 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_1/evaluation_runs.json
2024-05-01 19:34:13,891 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:34:13,898 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 19:34:17,429 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:1	Num Templates:5	Num Examples with Template:1235
2024-05-01 19:34:27,131 - root - [INFO] - 	!!!Scores: {'accuracy': 0.672, 'average': 0.672}
2024-05-01 19:34:27,131 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 19:34:27,131 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_2/evaluation_runs.json
2024-05-01 19:34:27,131 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:34:27,139 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 19:34:30,737 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:2	Num Templates:5	Num Examples with Template:1235
2024-05-01 19:34:40,516 - root - [INFO] - 	!!!Scores: {'accuracy': 0.679, 'average': 0.679}
2024-05-01 19:34:40,516 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 19:34:40,517 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_3/evaluation_runs.json
2024-05-01 19:34:40,517 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:34:40,525 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 19:34:44,162 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:3	Num Templates:5	Num Examples with Template:1235
2024-05-01 19:34:54,259 - root - [INFO] - 	!!!Scores: {'accuracy': 0.672, 'average': 0.672}
2024-05-01 19:34:54,259 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 19:34:54,259 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_4/evaluation_runs.json
2024-05-01 19:34:54,259 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:34:54,267 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 19:34:57,860 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:4	Num Templates:5	Num Examples with Template:1235
2024-05-01 19:35:07,846 - root - [INFO] - 	!!!Scores: {'accuracy': 0.672, 'average': 0.672}
2024-05-01 19:35:07,846 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 19:35:07,846 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_0/evaluation_runs.json
2024-05-01 19:35:07,846 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:35:07,854 - root - [INFO] - 		Loading Full Data for wic
2024-05-01 19:35:08,773 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-8f888d9273347dcb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 19:35:08,826 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 19:35:10,317 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:0	Num Templates:10	Num Examples with Template:606
2024-05-01 19:35:15,640 - root - [INFO] - 	!!!Scores: {'accuracy': 0.645, 'average': 0.645}
2024-05-01 19:35:15,641 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 19:35:15,641 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_1/evaluation_runs.json
2024-05-01 19:35:15,641 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:35:15,648 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 19:35:17,137 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:1	Num Templates:10	Num Examples with Template:606
2024-05-01 19:35:22,323 - root - [INFO] - 	!!!Scores: {'accuracy': 0.644, 'average': 0.644}
2024-05-01 19:35:22,323 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 19:35:22,323 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_2/evaluation_runs.json
2024-05-01 19:35:22,324 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:35:22,331 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 19:35:23,828 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:2	Num Templates:10	Num Examples with Template:606
2024-05-01 19:35:29,526 - root - [INFO] - 	!!!Scores: {'accuracy': 0.642, 'average': 0.642}
2024-05-01 19:35:29,526 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 19:35:29,526 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_3/evaluation_runs.json
2024-05-01 19:35:29,527 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:35:29,534 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 19:35:31,027 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:3	Num Templates:10	Num Examples with Template:606
2024-05-01 19:35:36,849 - root - [INFO] - 	!!!Scores: {'accuracy': 0.533, 'average': 0.533}
2024-05-01 19:35:36,849 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 19:35:36,849 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_4/evaluation_runs.json
2024-05-01 19:35:36,849 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:35:36,857 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 19:35:38,336 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:4	Num Templates:10	Num Examples with Template:606
2024-05-01 19:35:43,741 - root - [INFO] - 	!!!Scores: {'accuracy': 0.541, 'average': 0.541}
2024-05-01 19:35:43,741 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 19:35:43,741 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_5/evaluation_runs.json
2024-05-01 19:35:43,741 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:35:43,748 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 19:35:45,243 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:5	Num Templates:10	Num Examples with Template:606
2024-05-01 19:35:51,080 - root - [INFO] - 	!!!Scores: {'accuracy': 0.647, 'average': 0.647}
2024-05-01 19:35:51,081 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 19:35:51,081 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_6/evaluation_runs.json
2024-05-01 19:35:51,081 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:35:51,088 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 19:35:52,579 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:6	Num Templates:10	Num Examples with Template:606
2024-05-01 19:35:57,986 - root - [INFO] - 	!!!Scores: {'accuracy': 0.658, 'average': 0.658}
2024-05-01 19:35:57,987 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 19:35:57,987 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_7/evaluation_runs.json
2024-05-01 19:35:57,987 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:35:57,995 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 19:35:59,474 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:7	Num Templates:10	Num Examples with Template:606
2024-05-01 19:36:05,036 - root - [INFO] - 	!!!Scores: {'accuracy': 0.518, 'average': 0.518}
2024-05-01 19:36:05,036 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 19:36:05,036 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_8/evaluation_runs.json
2024-05-01 19:36:05,036 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:36:05,044 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 19:36:06,540 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:8	Num Templates:10	Num Examples with Template:606
2024-05-01 19:36:12,611 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 19:36:12,611 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 19:36:12,611 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_9/evaluation_runs.json
2024-05-01 19:36:12,611 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:36:12,619 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 19:36:14,105 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:9	Num Templates:10	Num Examples with Template:606
2024-05-01 19:36:18,835 - root - [INFO] - 	!!!Scores: {'accuracy': 0.642, 'average': 0.642}
2024-05-01 19:36:18,835 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 19:36:18,835 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_0/evaluation_runs.json
2024-05-01 19:36:18,835 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:36:18,843 - root - [INFO] - 		Loading Full Data for wsc
2024-05-01 19:36:19,528 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-4bcfdb1f33f6e278/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 19:36:19,544 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 19:36:19,733 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:0	Num Templates:10	Num Examples with Template:72
2024-05-01 19:36:21,522 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 19:36:21,523 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 19:36:21,523 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_1/evaluation_runs.json
2024-05-01 19:36:21,523 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:36:21,530 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 19:36:21,702 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:1	Num Templates:10	Num Examples with Template:72
2024-05-01 19:36:23,464 - root - [INFO] - 	!!!Scores: {'accuracy': 0.556, 'average': 0.556}
2024-05-01 19:36:23,464 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 19:36:23,464 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_2/evaluation_runs.json
2024-05-01 19:36:23,464 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:36:23,473 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 19:36:23,689 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:2	Num Templates:10	Num Examples with Template:72
2024-05-01 19:36:25,613 - root - [INFO] - 	!!!Scores: {'accuracy': 0.667, 'average': 0.667}
2024-05-01 19:36:25,613 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 19:36:25,613 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_3/evaluation_runs.json
2024-05-01 19:36:25,613 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:36:25,620 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 19:36:25,836 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:3	Num Templates:10	Num Examples with Template:72
2024-05-01 19:36:27,762 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 19:36:27,762 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 19:36:27,762 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_4/evaluation_runs.json
2024-05-01 19:36:27,762 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:36:27,769 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 19:36:27,952 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:4	Num Templates:10	Num Examples with Template:72
2024-05-01 19:36:29,702 - root - [INFO] - 	!!!Scores: {'accuracy': 0.556, 'average': 0.556}
2024-05-01 19:36:29,702 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 19:36:29,703 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_5/evaluation_runs.json
2024-05-01 19:36:29,703 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:36:29,710 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 19:36:29,885 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:5	Num Templates:10	Num Examples with Template:72
2024-05-01 19:36:31,685 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 19:36:31,685 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 19:36:31,685 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_6/evaluation_runs.json
2024-05-01 19:36:31,685 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:36:31,693 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 19:36:31,865 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:6	Num Templates:10	Num Examples with Template:72
2024-05-01 19:36:33,675 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 19:36:33,675 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 19:36:33,675 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_7/evaluation_runs.json
2024-05-01 19:36:33,675 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:36:33,682 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 19:36:33,955 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:7	Num Templates:10	Num Examples with Template:72
2024-05-01 19:36:35,722 - root - [INFO] - 	!!!Scores: {'accuracy': 0.514, 'average': 0.514}
2024-05-01 19:36:35,722 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 19:36:35,722 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_8/evaluation_runs.json
2024-05-01 19:36:35,722 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:36:35,729 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 19:36:35,902 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:8	Num Templates:10	Num Examples with Template:72
2024-05-01 19:36:37,699 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 19:36:37,699 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 19:36:37,699 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_9/evaluation_runs.json
2024-05-01 19:36:37,699 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:36:37,706 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 19:36:37,994 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:9	Num Templates:10	Num Examples with Template:72
2024-05-01 19:36:39,755 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 19:36:39,755 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 19:36:39,755 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_0/evaluation_runs.json
2024-05-01 19:36:39,755 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:36:39,762 - root - [INFO] - 		Loading Full Data for copa
2024-05-01 19:36:40,664 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-75ee279101665e7b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 19:36:40,680 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 19:36:40,896 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:0	Num Templates:8	Num Examples with Template:68
2024-05-01 19:36:42,480 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 19:36:42,481 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 19:36:42,481 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_1/evaluation_runs.json
2024-05-01 19:36:42,481 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:36:42,488 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 19:36:42,696 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:1	Num Templates:8	Num Examples with Template:68
2024-05-01 19:36:44,313 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 19:36:44,314 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 19:36:44,314 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_2/evaluation_runs.json
2024-05-01 19:36:44,314 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:36:44,321 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 19:36:44,529 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:2	Num Templates:8	Num Examples with Template:68
2024-05-01 19:36:46,124 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 19:36:46,124 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 19:36:46,124 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_3/evaluation_runs.json
2024-05-01 19:36:46,124 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:36:46,132 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 19:36:46,355 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:3	Num Templates:8	Num Examples with Template:68
2024-05-01 19:36:47,887 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 19:36:47,887 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 19:36:47,887 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_4/evaluation_runs.json
2024-05-01 19:36:47,887 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:36:47,895 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 19:36:48,102 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:4	Num Templates:8	Num Examples with Template:68
2024-05-01 19:36:49,698 - root - [INFO] - 	!!!Scores: {'accuracy': 0.882, 'average': 0.882}
2024-05-01 19:36:49,699 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 19:36:49,699 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_5/evaluation_runs.json
2024-05-01 19:36:49,699 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:36:49,706 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 19:36:49,915 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:5	Num Templates:8	Num Examples with Template:68
2024-05-01 19:36:51,521 - root - [INFO] - 	!!!Scores: {'accuracy': 0.853, 'average': 0.853}
2024-05-01 19:36:51,521 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 19:36:51,521 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_6/evaluation_runs.json
2024-05-01 19:36:51,521 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:36:51,528 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 19:36:51,736 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:6	Num Templates:8	Num Examples with Template:68
2024-05-01 19:36:53,296 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 19:36:53,297 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 19:36:53,297 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_7/evaluation_runs.json
2024-05-01 19:36:53,297 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:36:53,304 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 19:36:53,512 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:7	Num Templates:8	Num Examples with Template:68
2024-05-01 19:36:55,066 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 19:36:55,066 - root - [INFO] - 	Evaluating model on h-swag dataset
2024-05-01 19:36:55,066 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/h-swag_template_0/evaluation_runs.json
2024-05-01 19:36:55,066 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:36:55,079 - root - [INFO] - 		Loading Full Data for h-swag
2024-05-01 19:36:55,980 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-eac19eb96b26d7b6/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 19:36:56,662 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Selected Examples: 10010	Num Total Example:10010
2024-05-01 19:37:18,476 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Num Selected Example with Templates:10010	Template Idx:0	Num Templates:1	Num Examples with Template:10010
2024-05-01 19:42:31,586 - root - [INFO] - 	!!!Scores: {'accuracy': 0.425, 'average': 0.425}
2024-05-01 19:42:31,586 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 19:42:31,586 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_0/evaluation_runs.json
2024-05-01 19:42:31,586 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:42:32,487 - datasets.builder - [WARNING] - Found cached dataset arrow (/home/guodong/.cache/huggingface/datasets/arrow/default-2796e0ea53bfdf30/0.0.0/74f69db2c14c2860059d39860b1f400a03d11bf7fb5a8258ca38c501c878c137)
2024-05-01 19:42:32,599 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 19:42:38,768 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:0	Num Templates:5	Num Examples with Template:1839
2024-05-01 19:43:03,075 - root - [INFO] - 	!!!Scores: {'accuracy': 0.921, 'average': 0.921}
2024-05-01 19:43:03,075 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 19:43:03,075 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_1/evaluation_runs.json
2024-05-01 19:43:03,075 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:43:03,084 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 19:43:09,183 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:1	Num Templates:5	Num Examples with Template:1839
2024-05-01 19:43:33,921 - root - [INFO] - 	!!!Scores: {'accuracy': 0.922, 'average': 0.922}
2024-05-01 19:43:33,921 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 19:43:33,921 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_2/evaluation_runs.json
2024-05-01 19:43:33,921 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:43:33,929 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 19:43:39,968 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:2	Num Templates:5	Num Examples with Template:1839
2024-05-01 19:44:04,538 - root - [INFO] - 	!!!Scores: {'accuracy': 0.926, 'average': 0.926}
2024-05-01 19:44:04,538 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 19:44:04,538 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_3/evaluation_runs.json
2024-05-01 19:44:04,538 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:44:04,546 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 19:44:10,615 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:3	Num Templates:5	Num Examples with Template:1839
2024-05-01 19:44:35,239 - root - [INFO] - 	!!!Scores: {'accuracy': 0.916, 'average': 0.916}
2024-05-01 19:44:35,239 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 19:44:35,239 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_4/evaluation_runs.json
2024-05-01 19:44:35,240 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:44:35,249 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 19:44:41,303 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:4	Num Templates:5	Num Examples with Template:1839
2024-05-01 19:45:06,570 - root - [INFO] - 	!!!Scores: {'accuracy': 0.918, 'average': 0.918}
2024-05-01 19:45:06,570 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:45:06,570 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_0/evaluation_runs.json
2024-05-01 19:45:06,570 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:45:07,490 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-9d580cad42f0d988/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 19:45:07,543 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:45:09,351 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:45:27,808 - root - [INFO] - 	!!!Scores: {'accuracy': 0.643, 'average': 0.643}
2024-05-01 19:45:27,808 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:45:27,808 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_1/evaluation_runs.json
2024-05-01 19:45:27,808 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:45:27,816 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:45:29,658 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:45:46,186 - root - [INFO] - 	!!!Scores: {'accuracy': 0.669, 'average': 0.669}
2024-05-01 19:45:46,186 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:45:46,186 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_2/evaluation_runs.json
2024-05-01 19:45:46,187 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:45:46,194 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:45:48,015 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:46:04,262 - root - [INFO] - 	!!!Scores: {'accuracy': 0.679, 'average': 0.679}
2024-05-01 19:46:04,262 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:46:04,262 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_3/evaluation_runs.json
2024-05-01 19:46:04,262 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:46:04,271 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:46:06,076 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:46:22,492 - root - [INFO] - 	!!!Scores: {'accuracy': 0.666, 'average': 0.666}
2024-05-01 19:46:22,493 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:46:22,493 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_4/evaluation_runs.json
2024-05-01 19:46:22,493 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:46:22,501 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:46:24,299 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:46:41,067 - root - [INFO] - 	!!!Scores: {'accuracy': 0.678, 'average': 0.678}
2024-05-01 19:46:41,067 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:46:41,067 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_5/evaluation_runs.json
2024-05-01 19:46:41,068 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:46:41,075 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:46:42,897 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:46:59,353 - root - [INFO] - 	!!!Scores: {'accuracy': 0.673, 'average': 0.673}
2024-05-01 19:46:59,353 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:46:59,353 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_6/evaluation_runs.json
2024-05-01 19:46:59,353 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:46:59,361 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:47:01,736 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:47:19,856 - root - [INFO] - 	!!!Scores: {'accuracy': 0.666, 'average': 0.666}
2024-05-01 19:47:19,857 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:47:19,857 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_7/evaluation_runs.json
2024-05-01 19:47:19,857 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:47:19,865 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:47:21,706 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:47:38,396 - root - [INFO] - 	!!!Scores: {'accuracy': 0.673, 'average': 0.673}
2024-05-01 19:47:38,397 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:47:38,397 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_8/evaluation_runs.json
2024-05-01 19:47:38,397 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:47:38,405 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:47:40,245 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:47:56,763 - root - [INFO] - 	!!!Scores: {'accuracy': 0.677, 'average': 0.677}
2024-05-01 19:47:56,763 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:47:56,763 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_9/evaluation_runs.json
2024-05-01 19:47:56,763 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:47:56,771 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:47:59,148 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:48:16,302 - root - [INFO] - 	!!!Scores: {'accuracy': 0.637, 'average': 0.637}
2024-05-01 19:48:16,302 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:48:16,302 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_10/evaluation_runs.json
2024-05-01 19:48:16,302 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:48:16,311 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:48:18,681 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:48:35,522 - root - [INFO] - 	!!!Scores: {'accuracy': 0.658, 'average': 0.658}
2024-05-01 19:48:35,522 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:48:35,522 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_11/evaluation_runs.json
2024-05-01 19:48:35,522 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:48:35,530 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:48:37,331 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:48:53,854 - root - [INFO] - 	!!!Scores: {'accuracy': 0.67, 'average': 0.67}
2024-05-01 19:48:53,854 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:48:53,854 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_12/evaluation_runs.json
2024-05-01 19:48:53,854 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:48:53,862 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:48:56,215 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:49:13,074 - root - [INFO] - 	!!!Scores: {'accuracy': 0.637, 'average': 0.637}
2024-05-01 19:49:13,074 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:49:13,075 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_13/evaluation_runs.json
2024-05-01 19:49:13,075 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:49:13,082 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:49:15,431 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:49:33,158 - root - [INFO] - 	!!!Scores: {'accuracy': 0.661, 'average': 0.661}
2024-05-01 19:49:33,158 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 19:49:33,158 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_14/evaluation_runs.json
2024-05-01 19:49:33,158 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:49:33,166 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:49:35,004 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:49:51,525 - root - [INFO] - 	!!!Scores: {'accuracy': 0.669, 'average': 0.669}
2024-05-01 19:49:51,526 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:49:51,526 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_0/evaluation_runs.json
2024-05-01 19:49:51,526 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:49:52,236 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-2f5e309763d7971f/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 19:49:52,289 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:49:54,095 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:50:12,007 - root - [INFO] - 	!!!Scores: {'accuracy': 0.503, 'average': 0.503}
2024-05-01 19:50:12,007 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:50:12,007 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_1/evaluation_runs.json
2024-05-01 19:50:12,007 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:50:12,015 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:50:13,856 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:50:30,097 - root - [INFO] - 	!!!Scores: {'accuracy': 0.518, 'average': 0.518}
2024-05-01 19:50:30,097 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:50:30,097 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_2/evaluation_runs.json
2024-05-01 19:50:30,097 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:50:30,106 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:50:31,923 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:50:47,766 - root - [INFO] - 	!!!Scores: {'accuracy': 0.519, 'average': 0.519}
2024-05-01 19:50:47,766 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:50:47,766 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_3/evaluation_runs.json
2024-05-01 19:50:47,766 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:50:47,774 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:50:49,575 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:51:05,614 - root - [INFO] - 	!!!Scores: {'accuracy': 0.503, 'average': 0.503}
2024-05-01 19:51:05,614 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:51:05,614 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_4/evaluation_runs.json
2024-05-01 19:51:05,615 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:51:05,622 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:51:07,420 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:51:23,870 - root - [INFO] - 	!!!Scores: {'accuracy': 0.513, 'average': 0.513}
2024-05-01 19:51:23,870 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:51:23,870 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_5/evaluation_runs.json
2024-05-01 19:51:23,871 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:51:23,878 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:51:25,696 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:51:41,799 - root - [INFO] - 	!!!Scores: {'accuracy': 0.516, 'average': 0.516}
2024-05-01 19:51:41,799 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:51:41,799 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_6/evaluation_runs.json
2024-05-01 19:51:41,799 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:51:41,807 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:51:44,181 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:52:01,820 - root - [INFO] - 	!!!Scores: {'accuracy': 0.517, 'average': 0.517}
2024-05-01 19:52:01,820 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:52:01,820 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_7/evaluation_runs.json
2024-05-01 19:52:01,820 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:52:01,828 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:52:03,669 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:52:20,061 - root - [INFO] - 	!!!Scores: {'accuracy': 0.522, 'average': 0.522}
2024-05-01 19:52:20,062 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:52:20,062 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_8/evaluation_runs.json
2024-05-01 19:52:20,062 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:52:20,070 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:52:21,908 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:52:38,147 - root - [INFO] - 	!!!Scores: {'accuracy': 0.52, 'average': 0.52}
2024-05-01 19:52:38,147 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:52:38,147 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_9/evaluation_runs.json
2024-05-01 19:52:38,147 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:52:38,155 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:52:40,529 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:52:57,383 - root - [INFO] - 	!!!Scores: {'accuracy': 0.507, 'average': 0.507}
2024-05-01 19:52:57,383 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:52:57,383 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_10/evaluation_runs.json
2024-05-01 19:52:57,383 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:52:57,391 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:52:59,759 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:53:16,327 - root - [INFO] - 	!!!Scores: {'accuracy': 0.498, 'average': 0.498}
2024-05-01 19:53:16,327 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:53:16,327 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_11/evaluation_runs.json
2024-05-01 19:53:16,327 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:53:16,336 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:53:18,137 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:53:34,374 - root - [INFO] - 	!!!Scores: {'accuracy': 0.522, 'average': 0.522}
2024-05-01 19:53:34,374 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:53:34,374 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_12/evaluation_runs.json
2024-05-01 19:53:34,375 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:53:34,382 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:53:36,739 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:53:53,265 - root - [INFO] - 	!!!Scores: {'accuracy': 0.506, 'average': 0.506}
2024-05-01 19:53:53,265 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:53:53,265 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_13/evaluation_runs.json
2024-05-01 19:53:53,265 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:53:53,273 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:53:55,806 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:54:13,177 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 19:54:13,178 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 19:54:13,178 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_14/evaluation_runs.json
2024-05-01 19:54:13,178 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:54:13,186 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 19:54:15,043 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 19:54:31,278 - root - [INFO] - 	!!!Scores: {'accuracy': 0.516, 'average': 0.516}
2024-05-01 19:54:31,279 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 19:54:31,279 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_0/evaluation_runs.json
2024-05-01 19:54:31,279 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:54:32,202 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-5854c73e8e2a58bf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 19:54:32,263 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 19:54:34,428 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:0	Num Templates:15	Num Examples with Template:1200
2024-05-01 19:54:59,699 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 19:54:59,699 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 19:54:59,699 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_1/evaluation_runs.json
2024-05-01 19:54:59,700 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:54:59,707 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 19:55:01,921 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:1	Num Templates:15	Num Examples with Template:1200
2024-05-01 19:55:24,767 - root - [INFO] - 	!!!Scores: {'accuracy': 0.504, 'average': 0.504}
2024-05-01 19:55:24,767 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 19:55:24,767 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_2/evaluation_runs.json
2024-05-01 19:55:24,767 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:55:24,775 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 19:55:26,957 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:2	Num Templates:15	Num Examples with Template:1200
2024-05-01 19:55:49,424 - root - [INFO] - 	!!!Scores: {'accuracy': 0.484, 'average': 0.484}
2024-05-01 19:55:49,425 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 19:55:49,425 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_3/evaluation_runs.json
2024-05-01 19:55:49,425 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:55:49,434 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 19:55:51,596 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:3	Num Templates:15	Num Examples with Template:1200
2024-05-01 19:56:14,263 - root - [INFO] - 	!!!Scores: {'accuracy': 0.509, 'average': 0.509}
2024-05-01 19:56:14,263 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 19:56:14,263 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_4/evaluation_runs.json
2024-05-01 19:56:14,263 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:56:14,271 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 19:56:16,431 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:4	Num Templates:15	Num Examples with Template:1200
2024-05-01 19:56:39,656 - root - [INFO] - 	!!!Scores: {'accuracy': 0.502, 'average': 0.502}
2024-05-01 19:56:39,656 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 19:56:39,656 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_5/evaluation_runs.json
2024-05-01 19:56:39,656 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 19:56:39,664 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 19:56:41,845 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:5	Num Templates:15	Num Examples with Template:1200
2024-05-01 20:00:17,901 - root - [INFO] - 	!!!Scores: {'accuracy': 0.505, 'average': 0.505}
2024-05-01 20:00:17,901 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 20:00:17,901 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_6/evaluation_runs.json
2024-05-01 20:00:17,901 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:00:17,910 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 20:00:20,756 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:6	Num Templates:15	Num Examples with Template:1200
2024-05-01 20:00:45,302 - root - [INFO] - 	!!!Scores: {'accuracy': 0.494, 'average': 0.494}
2024-05-01 20:00:45,302 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 20:00:45,302 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_7/evaluation_runs.json
2024-05-01 20:00:45,302 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:00:45,310 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 20:00:47,519 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:7	Num Templates:15	Num Examples with Template:1200
2024-05-01 20:01:10,494 - root - [INFO] - 	!!!Scores: {'accuracy': 0.482, 'average': 0.482}
2024-05-01 20:01:10,494 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 20:01:10,495 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_8/evaluation_runs.json
2024-05-01 20:01:10,495 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:01:10,502 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 20:01:12,709 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:8	Num Templates:15	Num Examples with Template:1200
2024-05-01 20:01:35,484 - root - [INFO] - 	!!!Scores: {'accuracy': 0.484, 'average': 0.484}
2024-05-01 20:01:35,484 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 20:01:35,484 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_9/evaluation_runs.json
2024-05-01 20:01:35,484 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:01:35,492 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 20:01:38,339 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:9	Num Templates:15	Num Examples with Template:1200
2024-05-01 20:02:02,047 - root - [INFO] - 	!!!Scores: {'accuracy': 0.491, 'average': 0.491}
2024-05-01 20:02:02,047 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 20:02:02,047 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_10/evaluation_runs.json
2024-05-01 20:02:02,047 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:02:02,056 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 20:02:04,894 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:10	Num Templates:15	Num Examples with Template:1200
2024-05-01 20:02:28,266 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 20:02:28,266 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 20:02:28,266 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_11/evaluation_runs.json
2024-05-01 20:02:28,266 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:02:28,274 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 20:02:30,432 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:11	Num Templates:15	Num Examples with Template:1200
2024-05-01 20:02:53,286 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 20:02:53,287 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 20:02:53,287 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_12/evaluation_runs.json
2024-05-01 20:02:53,287 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:02:53,295 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 20:02:56,118 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:12	Num Templates:15	Num Examples with Template:1200
2024-05-01 20:03:19,490 - root - [INFO] - 	!!!Scores: {'accuracy': 0.513, 'average': 0.513}
2024-05-01 20:03:19,490 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 20:03:19,491 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_13/evaluation_runs.json
2024-05-01 20:03:19,491 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:03:19,498 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 20:03:22,311 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:13	Num Templates:15	Num Examples with Template:1200
2024-05-01 20:03:46,713 - root - [INFO] - 	!!!Scores: {'accuracy': 0.512, 'average': 0.512}
2024-05-01 20:03:46,714 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 20:03:46,714 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_14/evaluation_runs.json
2024-05-01 20:03:46,714 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:03:46,723 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 20:03:48,933 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:14	Num Templates:15	Num Examples with Template:1200
2024-05-01 20:04:11,783 - root - [INFO] - 	!!!Scores: {'accuracy': 0.501, 'average': 0.501}
2024-05-01 20:04:11,863 - root - [INFO] - Unexpected keys: []
2024-05-01 20:04:12,094 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 20:04:12,094 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_0/evaluation_runs.json
2024-05-01 20:04:12,094 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:04:12,103 - root - [INFO] - 		Loading Full Data for rte
2024-05-01 20:04:12,788 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-667c257dd17a5fbb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 20:04:12,811 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 20:04:13,328 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:0	Num Templates:10	Num Examples with Template:245
2024-05-01 20:04:19,079 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 20:04:19,079 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 20:04:19,079 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_1/evaluation_runs.json
2024-05-01 20:04:19,079 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:04:19,086 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 20:04:19,605 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:1	Num Templates:10	Num Examples with Template:245
2024-05-01 20:04:25,046 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 20:04:25,046 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 20:04:25,046 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_2/evaluation_runs.json
2024-05-01 20:04:25,046 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:04:25,053 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 20:04:25,578 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:2	Num Templates:10	Num Examples with Template:245
2024-05-01 20:04:31,021 - root - [INFO] - 	!!!Scores: {'accuracy': 0.812, 'average': 0.812}
2024-05-01 20:04:31,021 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 20:04:31,022 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_3/evaluation_runs.json
2024-05-01 20:04:31,022 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:04:31,029 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 20:04:31,545 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:3	Num Templates:10	Num Examples with Template:245
2024-05-01 20:04:36,915 - root - [INFO] - 	!!!Scores: {'accuracy': 0.776, 'average': 0.776}
2024-05-01 20:04:36,915 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 20:04:36,915 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_4/evaluation_runs.json
2024-05-01 20:04:36,915 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:04:36,923 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 20:04:37,438 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:4	Num Templates:10	Num Examples with Template:245
2024-05-01 20:04:42,879 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 20:04:42,879 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 20:04:42,879 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_5/evaluation_runs.json
2024-05-01 20:04:42,879 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:04:42,887 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 20:04:43,406 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:5	Num Templates:10	Num Examples with Template:245
2024-05-01 20:04:48,852 - root - [INFO] - 	!!!Scores: {'accuracy': 0.788, 'average': 0.788}
2024-05-01 20:04:48,852 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 20:04:48,852 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_6/evaluation_runs.json
2024-05-01 20:04:48,852 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:04:48,860 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 20:04:49,379 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:6	Num Templates:10	Num Examples with Template:245
2024-05-01 20:04:54,758 - root - [INFO] - 	!!!Scores: {'accuracy': 0.767, 'average': 0.767}
2024-05-01 20:04:54,758 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 20:04:54,759 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_7/evaluation_runs.json
2024-05-01 20:04:54,759 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:04:54,766 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 20:04:55,280 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:7	Num Templates:10	Num Examples with Template:245
2024-05-01 20:05:00,856 - root - [INFO] - 	!!!Scores: {'accuracy': 0.78, 'average': 0.78}
2024-05-01 20:05:00,856 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 20:05:00,857 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_8/evaluation_runs.json
2024-05-01 20:05:00,857 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:05:00,864 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 20:05:01,379 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:8	Num Templates:10	Num Examples with Template:245
2024-05-01 20:05:06,816 - root - [INFO] - 	!!!Scores: {'accuracy': 0.796, 'average': 0.796}
2024-05-01 20:05:06,816 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 20:05:06,816 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_9/evaluation_runs.json
2024-05-01 20:05:06,816 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:05:06,823 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 20:05:07,343 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:9	Num Templates:10	Num Examples with Template:245
2024-05-01 20:05:12,878 - root - [INFO] - 	!!!Scores: {'accuracy': 0.8, 'average': 0.8}
2024-05-01 20:05:12,878 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:05:12,878 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_0/evaluation_runs.json
2024-05-01 20:05:12,878 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:05:12,886 - root - [INFO] - 		Loading Full Data for cb
2024-05-01 20:05:13,803 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-c26e73467e3f215e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 20:05:13,812 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:05:13,876 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:0	Num Templates:15	Num Examples with Template:24
2024-05-01 20:05:15,319 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 20:05:15,319 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:05:15,319 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_1/evaluation_runs.json
2024-05-01 20:05:15,319 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:05:15,327 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:05:15,378 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:1	Num Templates:15	Num Examples with Template:24
2024-05-01 20:05:16,824 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 20:05:16,824 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:05:16,824 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_2/evaluation_runs.json
2024-05-01 20:05:16,824 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:05:16,832 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:05:16,896 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:2	Num Templates:15	Num Examples with Template:24
2024-05-01 20:05:18,364 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 20:05:18,364 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:05:18,364 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_3/evaluation_runs.json
2024-05-01 20:05:18,364 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:05:18,371 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:05:18,422 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:3	Num Templates:15	Num Examples with Template:24
2024-05-01 20:05:19,856 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 20:05:19,856 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:05:19,856 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_4/evaluation_runs.json
2024-05-01 20:05:19,857 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:05:19,864 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:05:19,914 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:4	Num Templates:15	Num Examples with Template:24
2024-05-01 20:05:21,355 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 20:05:21,355 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:05:21,355 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_5/evaluation_runs.json
2024-05-01 20:05:21,355 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:05:21,362 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:05:21,426 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:5	Num Templates:15	Num Examples with Template:24
2024-05-01 20:05:22,876 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 20:05:22,876 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:05:22,876 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_6/evaluation_runs.json
2024-05-01 20:05:22,876 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:05:22,884 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:05:22,934 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:6	Num Templates:15	Num Examples with Template:24
2024-05-01 20:05:24,372 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 20:05:24,372 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:05:24,372 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_7/evaluation_runs.json
2024-05-01 20:05:24,372 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:05:24,380 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:05:24,444 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:7	Num Templates:15	Num Examples with Template:24
2024-05-01 20:05:25,894 - root - [INFO] - 	!!!Scores: {'accuracy': 0.75, 'average': 0.75}
2024-05-01 20:05:25,894 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:05:25,894 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_8/evaluation_runs.json
2024-05-01 20:05:25,894 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:05:25,901 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:05:25,952 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:8	Num Templates:15	Num Examples with Template:24
2024-05-01 20:05:27,396 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 20:05:27,396 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:05:27,396 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_9/evaluation_runs.json
2024-05-01 20:05:27,396 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:05:27,403 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:05:27,454 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:9	Num Templates:15	Num Examples with Template:24
2024-05-01 20:05:28,900 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 20:05:28,900 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:05:28,900 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_10/evaluation_runs.json
2024-05-01 20:05:28,900 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:05:28,907 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:05:28,972 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:10	Num Templates:15	Num Examples with Template:24
2024-05-01 20:05:30,428 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 20:05:30,429 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:05:30,429 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_11/evaluation_runs.json
2024-05-01 20:05:30,429 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:05:30,436 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:05:30,487 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:11	Num Templates:15	Num Examples with Template:24
2024-05-01 20:05:31,930 - root - [INFO] - 	!!!Scores: {'accuracy': 0.75, 'average': 0.75}
2024-05-01 20:05:31,930 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:05:31,930 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_12/evaluation_runs.json
2024-05-01 20:05:31,930 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:05:31,938 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:05:31,989 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:12	Num Templates:15	Num Examples with Template:24
2024-05-01 20:05:33,477 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 20:05:33,477 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:05:33,477 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_13/evaluation_runs.json
2024-05-01 20:05:33,477 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:05:33,484 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:05:33,535 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:13	Num Templates:15	Num Examples with Template:24
2024-05-01 20:05:34,979 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 20:05:34,979 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:05:34,979 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_14/evaluation_runs.json
2024-05-01 20:05:34,979 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:05:34,986 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:05:35,051 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:14	Num Templates:15	Num Examples with Template:24
2024-05-01 20:05:36,528 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 20:05:36,528 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 20:05:36,528 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_0/evaluation_runs.json
2024-05-01 20:05:36,528 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:05:37,436 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-3ae7a9f52719d86a/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 20:05:37,492 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 20:05:41,091 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:0	Num Templates:5	Num Examples with Template:1235
2024-05-01 20:05:50,719 - root - [INFO] - 	!!!Scores: {'accuracy': 0.682, 'average': 0.682}
2024-05-01 20:05:50,719 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 20:05:50,719 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_1/evaluation_runs.json
2024-05-01 20:05:50,720 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:05:50,727 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 20:05:54,254 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:1	Num Templates:5	Num Examples with Template:1235
2024-05-01 20:06:03,972 - root - [INFO] - 	!!!Scores: {'accuracy': 0.669, 'average': 0.669}
2024-05-01 20:06:03,972 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 20:06:03,972 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_2/evaluation_runs.json
2024-05-01 20:06:03,972 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:06:03,980 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 20:06:07,569 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:2	Num Templates:5	Num Examples with Template:1235
2024-05-01 20:06:17,346 - root - [INFO] - 	!!!Scores: {'accuracy': 0.677, 'average': 0.677}
2024-05-01 20:06:17,347 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 20:06:17,347 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_3/evaluation_runs.json
2024-05-01 20:06:17,347 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:06:17,354 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 20:06:20,986 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:3	Num Templates:5	Num Examples with Template:1235
2024-05-01 20:06:31,073 - root - [INFO] - 	!!!Scores: {'accuracy': 0.672, 'average': 0.672}
2024-05-01 20:06:31,073 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 20:06:31,073 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_4/evaluation_runs.json
2024-05-01 20:06:31,073 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:06:31,081 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 20:06:34,684 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:4	Num Templates:5	Num Examples with Template:1235
2024-05-01 20:06:44,653 - root - [INFO] - 	!!!Scores: {'accuracy': 0.666, 'average': 0.666}
2024-05-01 20:06:44,654 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 20:06:44,654 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_0/evaluation_runs.json
2024-05-01 20:06:44,654 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:06:44,662 - root - [INFO] - 		Loading Full Data for wic
2024-05-01 20:06:45,572 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-8f888d9273347dcb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 20:06:45,624 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 20:06:47,118 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:0	Num Templates:10	Num Examples with Template:606
2024-05-01 20:06:52,438 - root - [INFO] - 	!!!Scores: {'accuracy': 0.635, 'average': 0.635}
2024-05-01 20:06:52,438 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 20:06:52,438 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_1/evaluation_runs.json
2024-05-01 20:06:52,438 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:06:52,446 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 20:06:53,936 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:1	Num Templates:10	Num Examples with Template:606
2024-05-01 20:06:59,116 - root - [INFO] - 	!!!Scores: {'accuracy': 0.644, 'average': 0.644}
2024-05-01 20:06:59,116 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 20:06:59,116 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_2/evaluation_runs.json
2024-05-01 20:06:59,117 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:06:59,124 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 20:07:00,618 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:2	Num Templates:10	Num Examples with Template:606
2024-05-01 20:07:06,316 - root - [INFO] - 	!!!Scores: {'accuracy': 0.635, 'average': 0.635}
2024-05-01 20:07:06,316 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 20:07:06,317 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_3/evaluation_runs.json
2024-05-01 20:07:06,317 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:07:06,324 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 20:07:07,816 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:3	Num Templates:10	Num Examples with Template:606
2024-05-01 20:07:13,636 - root - [INFO] - 	!!!Scores: {'accuracy': 0.54, 'average': 0.54}
2024-05-01 20:07:13,636 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 20:07:13,636 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_4/evaluation_runs.json
2024-05-01 20:07:13,636 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:07:13,644 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 20:07:15,123 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:4	Num Templates:10	Num Examples with Template:606
2024-05-01 20:07:20,527 - root - [INFO] - 	!!!Scores: {'accuracy': 0.551, 'average': 0.551}
2024-05-01 20:07:20,527 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 20:07:20,527 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_5/evaluation_runs.json
2024-05-01 20:07:20,527 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:07:20,535 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 20:07:22,030 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:5	Num Templates:10	Num Examples with Template:606
2024-05-01 20:07:27,862 - root - [INFO] - 	!!!Scores: {'accuracy': 0.649, 'average': 0.649}
2024-05-01 20:07:27,862 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 20:07:27,862 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_6/evaluation_runs.json
2024-05-01 20:07:27,862 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:07:27,870 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 20:07:29,359 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:6	Num Templates:10	Num Examples with Template:606
2024-05-01 20:07:34,765 - root - [INFO] - 	!!!Scores: {'accuracy': 0.663, 'average': 0.663}
2024-05-01 20:07:34,765 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 20:07:34,765 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_7/evaluation_runs.json
2024-05-01 20:07:34,765 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:07:34,773 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 20:07:36,251 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:7	Num Templates:10	Num Examples with Template:606
2024-05-01 20:07:41,811 - root - [INFO] - 	!!!Scores: {'accuracy': 0.523, 'average': 0.523}
2024-05-01 20:07:41,811 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 20:07:41,811 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_8/evaluation_runs.json
2024-05-01 20:07:41,811 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:07:41,818 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 20:07:43,316 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:8	Num Templates:10	Num Examples with Template:606
2024-05-01 20:07:49,394 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 20:07:49,394 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 20:07:49,395 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_9/evaluation_runs.json
2024-05-01 20:07:49,395 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:07:49,402 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 20:07:50,879 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:9	Num Templates:10	Num Examples with Template:606
2024-05-01 20:07:55,602 - root - [INFO] - 	!!!Scores: {'accuracy': 0.637, 'average': 0.637}
2024-05-01 20:07:55,602 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 20:07:55,602 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_0/evaluation_runs.json
2024-05-01 20:07:55,602 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:07:55,610 - root - [INFO] - 		Loading Full Data for wsc
2024-05-01 20:07:56,521 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-4bcfdb1f33f6e278/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 20:07:56,538 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 20:07:56,728 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:0	Num Templates:10	Num Examples with Template:72
2024-05-01 20:07:58,515 - root - [INFO] - 	!!!Scores: {'accuracy': 0.583, 'average': 0.583}
2024-05-01 20:07:58,515 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 20:07:58,515 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_1/evaluation_runs.json
2024-05-01 20:07:58,515 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:07:58,522 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 20:07:58,695 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:1	Num Templates:10	Num Examples with Template:72
2024-05-01 20:08:00,453 - root - [INFO] - 	!!!Scores: {'accuracy': 0.542, 'average': 0.542}
2024-05-01 20:08:00,453 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 20:08:00,453 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_2/evaluation_runs.json
2024-05-01 20:08:00,454 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:08:00,461 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 20:08:00,677 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:2	Num Templates:10	Num Examples with Template:72
2024-05-01 20:08:02,599 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 20:08:02,599 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 20:08:02,599 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_3/evaluation_runs.json
2024-05-01 20:08:02,599 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:08:02,607 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 20:08:02,822 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:3	Num Templates:10	Num Examples with Template:72
2024-05-01 20:08:04,748 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 20:08:04,748 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 20:08:04,748 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_4/evaluation_runs.json
2024-05-01 20:08:04,748 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:08:04,756 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 20:08:04,938 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:4	Num Templates:10	Num Examples with Template:72
2024-05-01 20:08:06,685 - root - [INFO] - 	!!!Scores: {'accuracy': 0.569, 'average': 0.569}
2024-05-01 20:08:06,685 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 20:08:06,685 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_5/evaluation_runs.json
2024-05-01 20:08:06,685 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:08:06,693 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 20:08:06,867 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:5	Num Templates:10	Num Examples with Template:72
2024-05-01 20:08:08,665 - root - [INFO] - 	!!!Scores: {'accuracy': 0.611, 'average': 0.611}
2024-05-01 20:08:08,665 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 20:08:08,665 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_6/evaluation_runs.json
2024-05-01 20:08:08,665 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:08:08,673 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 20:08:08,845 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:6	Num Templates:10	Num Examples with Template:72
2024-05-01 20:08:10,653 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 20:08:10,654 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 20:08:10,654 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_7/evaluation_runs.json
2024-05-01 20:08:10,654 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:08:10,661 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 20:08:10,934 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:7	Num Templates:10	Num Examples with Template:72
2024-05-01 20:08:12,702 - root - [INFO] - 	!!!Scores: {'accuracy': 0.514, 'average': 0.514}
2024-05-01 20:08:12,702 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 20:08:12,702 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_8/evaluation_runs.json
2024-05-01 20:08:12,702 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:08:12,710 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 20:08:12,883 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:8	Num Templates:10	Num Examples with Template:72
2024-05-01 20:08:14,679 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 20:08:14,679 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 20:08:14,679 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_9/evaluation_runs.json
2024-05-01 20:08:14,679 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:08:14,686 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 20:08:14,985 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:9	Num Templates:10	Num Examples with Template:72
2024-05-01 20:08:16,745 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 20:08:16,745 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 20:08:16,745 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_0/evaluation_runs.json
2024-05-01 20:08:16,745 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:08:16,753 - root - [INFO] - 		Loading Full Data for copa
2024-05-01 20:08:17,639 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-75ee279101665e7b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 20:08:17,655 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 20:08:17,873 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:0	Num Templates:8	Num Examples with Template:68
2024-05-01 20:08:19,457 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 20:08:19,458 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 20:08:19,458 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_1/evaluation_runs.json
2024-05-01 20:08:19,458 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:08:19,465 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 20:08:19,673 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:1	Num Templates:8	Num Examples with Template:68
2024-05-01 20:08:21,289 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 20:08:21,289 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 20:08:21,289 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_2/evaluation_runs.json
2024-05-01 20:08:21,289 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:08:21,296 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 20:08:21,505 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:2	Num Templates:8	Num Examples with Template:68
2024-05-01 20:08:23,100 - root - [INFO] - 	!!!Scores: {'accuracy': 0.926, 'average': 0.926}
2024-05-01 20:08:23,100 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 20:08:23,100 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_3/evaluation_runs.json
2024-05-01 20:08:23,100 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:08:23,107 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 20:08:23,330 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:3	Num Templates:8	Num Examples with Template:68
2024-05-01 20:08:24,862 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 20:08:24,862 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 20:08:24,862 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_4/evaluation_runs.json
2024-05-01 20:08:24,862 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:08:24,870 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 20:08:25,077 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:4	Num Templates:8	Num Examples with Template:68
2024-05-01 20:08:26,675 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 20:08:26,676 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 20:08:26,676 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_5/evaluation_runs.json
2024-05-01 20:08:26,676 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:08:26,683 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 20:08:26,892 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:5	Num Templates:8	Num Examples with Template:68
2024-05-01 20:08:28,499 - root - [INFO] - 	!!!Scores: {'accuracy': 0.853, 'average': 0.853}
2024-05-01 20:08:28,499 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 20:08:28,499 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_6/evaluation_runs.json
2024-05-01 20:08:28,499 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:08:28,507 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 20:08:28,715 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:6	Num Templates:8	Num Examples with Template:68
2024-05-01 20:08:30,276 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 20:08:30,276 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 20:08:30,276 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_7/evaluation_runs.json
2024-05-01 20:08:30,277 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:08:30,284 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 20:08:30,492 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:7	Num Templates:8	Num Examples with Template:68
2024-05-01 20:08:32,045 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 20:08:32,045 - root - [INFO] - 	Evaluating model on h-swag dataset
2024-05-01 20:08:32,045 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/h-swag_template_0/evaluation_runs.json
2024-05-01 20:08:32,045 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:08:32,058 - root - [INFO] - 		Loading Full Data for h-swag
2024-05-01 20:08:32,946 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-eac19eb96b26d7b6/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 20:08:33,634 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Selected Examples: 10010	Num Total Example:10010
2024-05-01 20:08:55,902 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Num Selected Example with Templates:10010	Template Idx:0	Num Templates:1	Num Examples with Template:10010
2024-05-01 20:14:08,770 - root - [INFO] - 	!!!Scores: {'accuracy': 0.426, 'average': 0.426}
2024-05-01 20:14:08,771 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 20:14:08,771 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_0/evaluation_runs.json
2024-05-01 20:14:08,771 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:14:09,704 - datasets.builder - [WARNING] - Found cached dataset arrow (/home/guodong/.cache/huggingface/datasets/arrow/default-2796e0ea53bfdf30/0.0.0/74f69db2c14c2860059d39860b1f400a03d11bf7fb5a8258ca38c501c878c137)
2024-05-01 20:14:09,816 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 20:14:15,795 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:0	Num Templates:5	Num Examples with Template:1839
2024-05-01 20:14:40,070 - root - [INFO] - 	!!!Scores: {'accuracy': 0.924, 'average': 0.924}
2024-05-01 20:14:40,070 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 20:14:40,070 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_1/evaluation_runs.json
2024-05-01 20:14:40,070 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:14:40,078 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 20:14:46,135 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:1	Num Templates:5	Num Examples with Template:1839
2024-05-01 20:15:10,853 - root - [INFO] - 	!!!Scores: {'accuracy': 0.921, 'average': 0.921}
2024-05-01 20:15:10,853 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 20:15:10,853 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_2/evaluation_runs.json
2024-05-01 20:15:10,853 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:15:10,862 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 20:15:16,859 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:2	Num Templates:5	Num Examples with Template:1839
2024-05-01 20:15:41,419 - root - [INFO] - 	!!!Scores: {'accuracy': 0.926, 'average': 0.926}
2024-05-01 20:15:41,419 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 20:15:41,419 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_3/evaluation_runs.json
2024-05-01 20:15:41,419 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:15:41,427 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 20:15:47,471 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:3	Num Templates:5	Num Examples with Template:1839
2024-05-01 20:16:12,050 - root - [INFO] - 	!!!Scores: {'accuracy': 0.921, 'average': 0.921}
2024-05-01 20:16:12,050 - root - [INFO] - 	Evaluating model on story_cloze dataset
2024-05-01 20:16:12,050 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/story_cloze_template_4/evaluation_runs.json
2024-05-01 20:16:12,051 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:16:12,059 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Selected Examples: 1839	Num Total Example:1839
2024-05-01 20:16:18,068 - root - [INFO] - 	Dataset:STORY_CLOZE	Split:test	Num Selected Example with Templates:1839	Template Idx:4	Num Templates:5	Num Examples with Template:1839
2024-05-01 20:16:43,334 - root - [INFO] - 	!!!Scores: {'accuracy': 0.92, 'average': 0.92}
2024-05-01 20:16:43,335 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 20:16:43,335 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_0/evaluation_runs.json
2024-05-01 20:16:43,335 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:16:44,022 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-9d580cad42f0d988/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 20:16:44,074 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:16:45,885 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:17:04,352 - root - [INFO] - 	!!!Scores: {'accuracy': 0.644, 'average': 0.644}
2024-05-01 20:17:04,352 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 20:17:04,352 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_1/evaluation_runs.json
2024-05-01 20:17:04,352 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:17:04,361 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:17:06,198 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:17:22,730 - root - [INFO] - 	!!!Scores: {'accuracy': 0.674, 'average': 0.674}
2024-05-01 20:17:22,730 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 20:17:22,730 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_2/evaluation_runs.json
2024-05-01 20:17:22,730 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:17:22,738 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:17:24,551 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:17:40,794 - root - [INFO] - 	!!!Scores: {'accuracy': 0.68, 'average': 0.68}
2024-05-01 20:17:40,794 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 20:17:40,794 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_3/evaluation_runs.json
2024-05-01 20:17:40,795 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:17:40,802 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:17:42,605 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:17:59,015 - root - [INFO] - 	!!!Scores: {'accuracy': 0.663, 'average': 0.663}
2024-05-01 20:17:59,016 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 20:17:59,016 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_4/evaluation_runs.json
2024-05-01 20:17:59,016 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:17:59,023 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:18:00,833 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:18:17,597 - root - [INFO] - 	!!!Scores: {'accuracy': 0.678, 'average': 0.678}
2024-05-01 20:18:17,597 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 20:18:17,597 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_5/evaluation_runs.json
2024-05-01 20:18:17,597 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:18:17,605 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:18:19,421 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:18:35,863 - root - [INFO] - 	!!!Scores: {'accuracy': 0.674, 'average': 0.674}
2024-05-01 20:18:35,863 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 20:18:35,863 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_6/evaluation_runs.json
2024-05-01 20:18:35,864 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:18:35,871 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:18:38,241 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:18:56,354 - root - [INFO] - 	!!!Scores: {'accuracy': 0.665, 'average': 0.665}
2024-05-01 20:18:56,354 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 20:18:56,354 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_7/evaluation_runs.json
2024-05-01 20:18:56,354 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:18:56,362 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:18:58,201 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:19:14,878 - root - [INFO] - 	!!!Scores: {'accuracy': 0.679, 'average': 0.679}
2024-05-01 20:19:14,878 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 20:19:14,878 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_8/evaluation_runs.json
2024-05-01 20:19:14,878 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:19:14,887 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:19:16,732 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:19:33,234 - root - [INFO] - 	!!!Scores: {'accuracy': 0.676, 'average': 0.676}
2024-05-01 20:19:33,234 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 20:19:33,234 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_9/evaluation_runs.json
2024-05-01 20:19:33,234 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:19:33,242 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:19:35,612 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:19:52,753 - root - [INFO] - 	!!!Scores: {'accuracy': 0.635, 'average': 0.635}
2024-05-01 20:19:52,753 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 20:19:52,754 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_10/evaluation_runs.json
2024-05-01 20:19:52,754 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:19:52,761 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:19:55,124 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:20:11,958 - root - [INFO] - 	!!!Scores: {'accuracy': 0.655, 'average': 0.655}
2024-05-01 20:20:11,958 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 20:20:11,958 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_11/evaluation_runs.json
2024-05-01 20:20:11,958 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:20:11,966 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:20:13,762 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:20:30,269 - root - [INFO] - 	!!!Scores: {'accuracy': 0.667, 'average': 0.667}
2024-05-01 20:20:30,270 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 20:20:30,270 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_12/evaluation_runs.json
2024-05-01 20:20:30,270 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:20:30,277 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:20:32,823 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:20:49,665 - root - [INFO] - 	!!!Scores: {'accuracy': 0.635, 'average': 0.635}
2024-05-01 20:20:49,665 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 20:20:49,665 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_13/evaluation_runs.json
2024-05-01 20:20:49,665 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:20:49,673 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:20:52,024 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:21:09,746 - root - [INFO] - 	!!!Scores: {'accuracy': 0.662, 'average': 0.662}
2024-05-01 20:21:09,746 - root - [INFO] - 	Evaluating model on anli-r1 dataset
2024-05-01 20:21:09,746 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r1_template_14/evaluation_runs.json
2024-05-01 20:21:09,746 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:21:09,754 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:21:11,593 - root - [INFO] - 	Dataset:ANLI-R1	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:21:28,107 - root - [INFO] - 	!!!Scores: {'accuracy': 0.669, 'average': 0.669}
2024-05-01 20:21:28,107 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 20:21:28,107 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_0/evaluation_runs.json
2024-05-01 20:21:28,107 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:21:28,805 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-2f5e309763d7971f/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 20:21:28,857 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:21:30,654 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:0	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:21:48,555 - root - [INFO] - 	!!!Scores: {'accuracy': 0.499, 'average': 0.499}
2024-05-01 20:21:48,555 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 20:21:48,556 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_1/evaluation_runs.json
2024-05-01 20:21:48,556 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:21:48,563 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:21:50,398 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:1	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:22:06,617 - root - [INFO] - 	!!!Scores: {'accuracy': 0.517, 'average': 0.517}
2024-05-01 20:22:06,617 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 20:22:06,617 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_2/evaluation_runs.json
2024-05-01 20:22:06,617 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:22:06,625 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:22:08,437 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:2	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:22:24,264 - root - [INFO] - 	!!!Scores: {'accuracy': 0.513, 'average': 0.513}
2024-05-01 20:22:24,264 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 20:22:24,264 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_3/evaluation_runs.json
2024-05-01 20:22:24,264 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:22:24,272 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:22:26,067 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:3	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:22:42,093 - root - [INFO] - 	!!!Scores: {'accuracy': 0.503, 'average': 0.503}
2024-05-01 20:22:42,093 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 20:22:42,093 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_4/evaluation_runs.json
2024-05-01 20:22:42,093 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:22:42,102 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:22:43,892 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:4	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:23:00,328 - root - [INFO] - 	!!!Scores: {'accuracy': 0.518, 'average': 0.518}
2024-05-01 20:23:00,328 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 20:23:00,328 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_5/evaluation_runs.json
2024-05-01 20:23:00,328 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:23:00,336 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:23:02,147 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:5	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:23:18,230 - root - [INFO] - 	!!!Scores: {'accuracy': 0.517, 'average': 0.517}
2024-05-01 20:23:18,230 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 20:23:18,230 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_6/evaluation_runs.json
2024-05-01 20:23:18,230 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:23:18,238 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:23:20,603 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:6	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:23:38,236 - root - [INFO] - 	!!!Scores: {'accuracy': 0.516, 'average': 0.516}
2024-05-01 20:23:38,236 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 20:23:38,236 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_7/evaluation_runs.json
2024-05-01 20:23:38,236 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:23:38,244 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:23:40,077 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:7	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:23:56,455 - root - [INFO] - 	!!!Scores: {'accuracy': 0.521, 'average': 0.521}
2024-05-01 20:23:56,455 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 20:23:56,455 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_8/evaluation_runs.json
2024-05-01 20:23:56,456 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:23:56,463 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:23:58,296 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:8	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:24:14,522 - root - [INFO] - 	!!!Scores: {'accuracy': 0.517, 'average': 0.517}
2024-05-01 20:24:14,522 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 20:24:14,522 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_9/evaluation_runs.json
2024-05-01 20:24:14,522 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:24:14,531 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:24:16,898 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:9	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:24:33,733 - root - [INFO] - 	!!!Scores: {'accuracy': 0.501, 'average': 0.501}
2024-05-01 20:24:33,733 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 20:24:33,733 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_10/evaluation_runs.json
2024-05-01 20:24:33,733 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:24:33,741 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:24:36,099 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:10	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:24:52,644 - root - [INFO] - 	!!!Scores: {'accuracy': 0.498, 'average': 0.498}
2024-05-01 20:24:52,644 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 20:24:52,644 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_11/evaluation_runs.json
2024-05-01 20:24:52,645 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:24:52,652 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:24:54,447 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:11	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:25:10,669 - root - [INFO] - 	!!!Scores: {'accuracy': 0.52, 'average': 0.52}
2024-05-01 20:25:10,669 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 20:25:10,669 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_12/evaluation_runs.json
2024-05-01 20:25:10,669 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:25:10,677 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:25:13,027 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:12	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:25:29,552 - root - [INFO] - 	!!!Scores: {'accuracy': 0.501, 'average': 0.501}
2024-05-01 20:25:29,552 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 20:25:29,552 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_13/evaluation_runs.json
2024-05-01 20:25:29,553 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:25:29,560 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:25:31,910 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:13	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:25:49,265 - root - [INFO] - 	!!!Scores: {'accuracy': 0.498, 'average': 0.498}
2024-05-01 20:25:49,266 - root - [INFO] - 	Evaluating model on anli-r2 dataset
2024-05-01 20:25:49,266 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r2_template_14/evaluation_runs.json
2024-05-01 20:25:49,266 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:25:49,274 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Selected Examples: 1000	Num Total Example:1000
2024-05-01 20:25:51,108 - root - [INFO] - 	Dataset:ANLI-R2	Split:test	Num Selected Example with Templates:1000	Template Idx:14	Num Templates:15	Num Examples with Template:1000
2024-05-01 20:26:07,342 - root - [INFO] - 	!!!Scores: {'accuracy': 0.52, 'average': 0.52}
2024-05-01 20:26:07,342 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 20:26:07,343 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_0/evaluation_runs.json
2024-05-01 20:26:07,343 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:26:08,037 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-5854c73e8e2a58bf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 20:26:08,096 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 20:26:10,253 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:0	Num Templates:15	Num Examples with Template:1200
2024-05-01 20:26:35,545 - root - [INFO] - 	!!!Scores: {'accuracy': 0.496, 'average': 0.496}
2024-05-01 20:26:35,546 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 20:26:35,546 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_1/evaluation_runs.json
2024-05-01 20:26:35,546 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:26:35,554 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 20:26:37,759 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:1	Num Templates:15	Num Examples with Template:1200
2024-05-01 20:27:00,607 - root - [INFO] - 	!!!Scores: {'accuracy': 0.498, 'average': 0.498}
2024-05-01 20:27:00,607 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 20:27:00,607 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_2/evaluation_runs.json
2024-05-01 20:27:00,607 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:27:00,615 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 20:27:02,793 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:2	Num Templates:15	Num Examples with Template:1200
2024-05-01 20:27:25,260 - root - [INFO] - 	!!!Scores: {'accuracy': 0.482, 'average': 0.482}
2024-05-01 20:27:25,260 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 20:27:25,260 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_3/evaluation_runs.json
2024-05-01 20:27:25,260 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:27:25,268 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 20:27:27,447 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:3	Num Templates:15	Num Examples with Template:1200
2024-05-01 20:27:50,113 - root - [INFO] - 	!!!Scores: {'accuracy': 0.511, 'average': 0.511}
2024-05-01 20:27:50,113 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 20:27:50,113 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_4/evaluation_runs.json
2024-05-01 20:27:50,113 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:27:50,122 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 20:27:52,279 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:4	Num Templates:15	Num Examples with Template:1200
2024-05-01 20:28:15,509 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 20:28:15,509 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 20:28:15,510 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_5/evaluation_runs.json
2024-05-01 20:28:15,510 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:28:15,518 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 20:28:17,696 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:5	Num Templates:15	Num Examples with Template:1200
2024-05-01 20:28:40,427 - root - [INFO] - 	!!!Scores: {'accuracy': 0.509, 'average': 0.509}
2024-05-01 20:28:40,427 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 20:28:40,427 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_6/evaluation_runs.json
2024-05-01 20:28:40,427 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:28:40,435 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 20:28:43,281 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:6	Num Templates:15	Num Examples with Template:1200
2024-05-01 20:29:08,139 - root - [INFO] - 	!!!Scores: {'accuracy': 0.492, 'average': 0.492}
2024-05-01 20:29:08,139 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 20:29:08,139 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_7/evaluation_runs.json
2024-05-01 20:29:08,139 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:29:08,147 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 20:29:10,353 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:7	Num Templates:15	Num Examples with Template:1200
2024-05-01 20:29:33,484 - root - [INFO] - 	!!!Scores: {'accuracy': 0.485, 'average': 0.485}
2024-05-01 20:29:33,485 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 20:29:33,485 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_8/evaluation_runs.json
2024-05-01 20:29:33,485 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:29:33,493 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 20:29:35,698 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:8	Num Templates:15	Num Examples with Template:1200
2024-05-01 20:29:58,530 - root - [INFO] - 	!!!Scores: {'accuracy': 0.487, 'average': 0.487}
2024-05-01 20:29:58,530 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 20:29:58,530 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_9/evaluation_runs.json
2024-05-01 20:29:58,530 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:29:58,539 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 20:30:01,385 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:9	Num Templates:15	Num Examples with Template:1200
2024-05-01 20:30:25,094 - root - [INFO] - 	!!!Scores: {'accuracy': 0.493, 'average': 0.493}
2024-05-01 20:30:25,094 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 20:30:25,094 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_10/evaluation_runs.json
2024-05-01 20:30:25,095 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:30:25,103 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 20:30:27,941 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:10	Num Templates:15	Num Examples with Template:1200
2024-05-01 20:30:51,291 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 20:30:51,291 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 20:30:51,291 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_11/evaluation_runs.json
2024-05-01 20:30:51,291 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:30:51,299 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 20:30:53,456 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:11	Num Templates:15	Num Examples with Template:1200
2024-05-01 20:31:16,291 - root - [INFO] - 	!!!Scores: {'accuracy': 0.497, 'average': 0.497}
2024-05-01 20:31:16,291 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 20:31:16,291 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_12/evaluation_runs.json
2024-05-01 20:31:16,291 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:31:16,299 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 20:31:19,122 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:12	Num Templates:15	Num Examples with Template:1200
2024-05-01 20:31:42,473 - root - [INFO] - 	!!!Scores: {'accuracy': 0.511, 'average': 0.511}
2024-05-01 20:31:42,473 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 20:31:42,473 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_13/evaluation_runs.json
2024-05-01 20:31:42,473 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:31:42,481 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 20:31:45,293 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:13	Num Templates:15	Num Examples with Template:1200
2024-05-01 20:32:09,673 - root - [INFO] - 	!!!Scores: {'accuracy': 0.505, 'average': 0.505}
2024-05-01 20:32:09,673 - root - [INFO] - 	Evaluating model on anli-r3 dataset
2024-05-01 20:32:09,673 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/anli-r3_template_14/evaluation_runs.json
2024-05-01 20:32:09,673 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:32:09,681 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Selected Examples: 1200	Num Total Example:1200
2024-05-01 20:32:11,889 - root - [INFO] - 	Dataset:ANLI-R3	Split:test	Num Selected Example with Templates:1200	Template Idx:14	Num Templates:15	Num Examples with Template:1200
2024-05-01 20:32:34,738 - root - [INFO] - 	!!!Scores: {'accuracy': 0.499, 'average': 0.499}
2024-05-01 20:32:34,819 - root - [INFO] - Unexpected keys: []
2024-05-01 20:32:35,050 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 20:32:35,050 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_0/evaluation_runs.json
2024-05-01 20:32:35,050 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:32:35,059 - root - [INFO] - 		Loading Full Data for rte
2024-05-01 20:32:35,964 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-667c257dd17a5fbb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 20:32:35,988 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 20:32:36,506 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:0	Num Templates:10	Num Examples with Template:245
2024-05-01 20:32:42,258 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 20:32:42,258 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 20:32:42,258 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_1/evaluation_runs.json
2024-05-01 20:32:42,258 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:32:42,265 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 20:32:42,784 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:1	Num Templates:10	Num Examples with Template:245
2024-05-01 20:32:48,225 - root - [INFO] - 	!!!Scores: {'accuracy': 0.796, 'average': 0.796}
2024-05-01 20:32:48,225 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 20:32:48,226 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_2/evaluation_runs.json
2024-05-01 20:32:48,226 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:32:48,233 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 20:32:48,752 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:2	Num Templates:10	Num Examples with Template:245
2024-05-01 20:32:54,194 - root - [INFO] - 	!!!Scores: {'accuracy': 0.812, 'average': 0.812}
2024-05-01 20:32:54,194 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 20:32:54,194 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_3/evaluation_runs.json
2024-05-01 20:32:54,194 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:32:54,202 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 20:32:54,716 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:3	Num Templates:10	Num Examples with Template:245
2024-05-01 20:33:00,086 - root - [INFO] - 	!!!Scores: {'accuracy': 0.784, 'average': 0.784}
2024-05-01 20:33:00,086 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 20:33:00,086 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_4/evaluation_runs.json
2024-05-01 20:33:00,086 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:33:00,094 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 20:33:00,609 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:4	Num Templates:10	Num Examples with Template:245
2024-05-01 20:33:06,056 - root - [INFO] - 	!!!Scores: {'accuracy': 0.808, 'average': 0.808}
2024-05-01 20:33:06,056 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 20:33:06,056 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_5/evaluation_runs.json
2024-05-01 20:33:06,056 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:33:06,063 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 20:33:06,582 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:5	Num Templates:10	Num Examples with Template:245
2024-05-01 20:33:12,028 - root - [INFO] - 	!!!Scores: {'accuracy': 0.788, 'average': 0.788}
2024-05-01 20:33:12,028 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 20:33:12,028 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_6/evaluation_runs.json
2024-05-01 20:33:12,028 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:33:12,036 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 20:33:12,555 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:6	Num Templates:10	Num Examples with Template:245
2024-05-01 20:33:17,933 - root - [INFO] - 	!!!Scores: {'accuracy': 0.771, 'average': 0.771}
2024-05-01 20:33:17,934 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 20:33:17,934 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_7/evaluation_runs.json
2024-05-01 20:33:17,934 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:33:17,941 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 20:33:18,455 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:7	Num Templates:10	Num Examples with Template:245
2024-05-01 20:33:24,036 - root - [INFO] - 	!!!Scores: {'accuracy': 0.78, 'average': 0.78}
2024-05-01 20:33:24,036 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 20:33:24,036 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_8/evaluation_runs.json
2024-05-01 20:33:24,036 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:33:24,044 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 20:33:24,557 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:8	Num Templates:10	Num Examples with Template:245
2024-05-01 20:33:29,997 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 20:33:29,997 - root - [INFO] - 	Evaluating model on rte dataset
2024-05-01 20:33:29,997 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/rte_template_9/evaluation_runs.json
2024-05-01 20:33:29,997 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:33:30,005 - root - [INFO] - 	Dataset:RTE	Split:test	Selected Examples: 245	Num Total Example:245
2024-05-01 20:33:30,525 - root - [INFO] - 	Dataset:RTE	Split:test	Num Selected Example with Templates:245	Template Idx:9	Num Templates:10	Num Examples with Template:245
2024-05-01 20:33:36,062 - root - [INFO] - 	!!!Scores: {'accuracy': 0.8, 'average': 0.8}
2024-05-01 20:33:36,062 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:33:36,062 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_0/evaluation_runs.json
2024-05-01 20:33:36,062 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:33:36,070 - root - [INFO] - 		Loading Full Data for cb
2024-05-01 20:33:36,753 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-c26e73467e3f215e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 20:33:36,762 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:33:36,827 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:0	Num Templates:15	Num Examples with Template:24
2024-05-01 20:33:38,271 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 20:33:38,271 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:33:38,271 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_1/evaluation_runs.json
2024-05-01 20:33:38,272 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:33:38,279 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:33:38,330 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:1	Num Templates:15	Num Examples with Template:24
2024-05-01 20:33:39,775 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 20:33:39,775 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:33:39,775 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_2/evaluation_runs.json
2024-05-01 20:33:39,775 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:33:39,782 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:33:39,846 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:2	Num Templates:15	Num Examples with Template:24
2024-05-01 20:33:41,314 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 20:33:41,314 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:33:41,314 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_3/evaluation_runs.json
2024-05-01 20:33:41,314 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:33:41,321 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:33:41,372 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:3	Num Templates:15	Num Examples with Template:24
2024-05-01 20:33:42,805 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 20:33:42,805 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:33:42,805 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_4/evaluation_runs.json
2024-05-01 20:33:42,805 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:33:42,812 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:33:42,863 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:4	Num Templates:15	Num Examples with Template:24
2024-05-01 20:33:44,303 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 20:33:44,304 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:33:44,304 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_5/evaluation_runs.json
2024-05-01 20:33:44,304 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:33:44,311 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:33:44,375 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:5	Num Templates:15	Num Examples with Template:24
2024-05-01 20:33:45,826 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 20:33:45,826 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:33:45,826 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_6/evaluation_runs.json
2024-05-01 20:33:45,826 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:33:45,833 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:33:45,884 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:6	Num Templates:15	Num Examples with Template:24
2024-05-01 20:33:47,322 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 20:33:47,322 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:33:47,322 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_7/evaluation_runs.json
2024-05-01 20:33:47,322 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:33:47,329 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:33:47,393 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:7	Num Templates:15	Num Examples with Template:24
2024-05-01 20:33:48,843 - root - [INFO] - 	!!!Scores: {'accuracy': 0.75, 'average': 0.75}
2024-05-01 20:33:48,844 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:33:48,844 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_8/evaluation_runs.json
2024-05-01 20:33:48,844 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:33:48,851 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:33:48,902 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:8	Num Templates:15	Num Examples with Template:24
2024-05-01 20:33:50,346 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 20:33:50,346 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:33:50,346 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_9/evaluation_runs.json
2024-05-01 20:33:50,346 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:33:50,354 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:33:50,405 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:9	Num Templates:15	Num Examples with Template:24
2024-05-01 20:33:51,851 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 20:33:51,851 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:33:51,851 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_10/evaluation_runs.json
2024-05-01 20:33:51,851 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:33:51,858 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:33:51,923 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:10	Num Templates:15	Num Examples with Template:24
2024-05-01 20:33:53,379 - root - [INFO] - 	!!!Scores: {'accuracy': 0.875, 'average': 0.875}
2024-05-01 20:33:53,379 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:33:53,379 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_11/evaluation_runs.json
2024-05-01 20:33:53,379 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:33:53,386 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:33:53,437 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:11	Num Templates:15	Num Examples with Template:24
2024-05-01 20:33:54,879 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 20:33:54,880 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:33:54,880 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_12/evaluation_runs.json
2024-05-01 20:33:54,880 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:33:54,887 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:33:54,938 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:12	Num Templates:15	Num Examples with Template:24
2024-05-01 20:33:56,426 - root - [INFO] - 	!!!Scores: {'accuracy': 0.792, 'average': 0.792}
2024-05-01 20:33:56,426 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:33:56,426 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_13/evaluation_runs.json
2024-05-01 20:33:56,426 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:33:56,433 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:33:56,484 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:13	Num Templates:15	Num Examples with Template:24
2024-05-01 20:33:57,927 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 20:33:57,927 - root - [INFO] - 	Evaluating model on cb dataset
2024-05-01 20:33:57,927 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/cb_template_14/evaluation_runs.json
2024-05-01 20:33:57,927 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:33:57,935 - root - [INFO] - 	Dataset:CB	Split:test	Selected Examples: 24	Num Total Example:24
2024-05-01 20:33:57,999 - root - [INFO] - 	Dataset:CB	Split:test	Num Selected Example with Templates:24	Template Idx:14	Num Templates:15	Num Examples with Template:24
2024-05-01 20:33:59,476 - root - [INFO] - 	!!!Scores: {'accuracy': 0.833, 'average': 0.833}
2024-05-01 20:33:59,477 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 20:33:59,477 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_0/evaluation_runs.json
2024-05-01 20:33:59,477 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:34:00,169 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-3ae7a9f52719d86a/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 20:34:00,226 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 20:34:03,825 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:0	Num Templates:5	Num Examples with Template:1235
2024-05-01 20:34:13,451 - root - [INFO] - 	!!!Scores: {'accuracy': 0.684, 'average': 0.684}
2024-05-01 20:34:13,451 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 20:34:13,452 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_1/evaluation_runs.json
2024-05-01 20:34:13,452 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:34:13,459 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 20:34:16,996 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:1	Num Templates:5	Num Examples with Template:1235
2024-05-01 20:34:26,714 - root - [INFO] - 	!!!Scores: {'accuracy': 0.671, 'average': 0.671}
2024-05-01 20:34:26,714 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 20:34:26,714 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_2/evaluation_runs.json
2024-05-01 20:34:26,714 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:34:26,721 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 20:34:30,440 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:2	Num Templates:5	Num Examples with Template:1235
2024-05-01 20:34:40,209 - root - [INFO] - 	!!!Scores: {'accuracy': 0.676, 'average': 0.676}
2024-05-01 20:34:40,209 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 20:34:40,209 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_3/evaluation_runs.json
2024-05-01 20:34:40,209 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:34:40,217 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 20:34:43,855 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:3	Num Templates:5	Num Examples with Template:1235
2024-05-01 20:34:53,945 - root - [INFO] - 	!!!Scores: {'accuracy': 0.673, 'average': 0.673}
2024-05-01 20:34:53,946 - root - [INFO] - 	Evaluating model on winogrande dataset
2024-05-01 20:34:53,946 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/winogrande_template_4/evaluation_runs.json
2024-05-01 20:34:53,946 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:34:53,953 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Selected Examples: 1235	Num Total Example:1235
2024-05-01 20:34:57,563 - root - [INFO] - 	Dataset:WINOGRANDE	Split:test	Num Selected Example with Templates:1235	Template Idx:4	Num Templates:5	Num Examples with Template:1235
2024-05-01 20:35:07,540 - root - [INFO] - 	!!!Scores: {'accuracy': 0.668, 'average': 0.668}
2024-05-01 20:35:07,541 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 20:35:07,541 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_0/evaluation_runs.json
2024-05-01 20:35:07,541 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:35:07,549 - root - [INFO] - 		Loading Full Data for wic
2024-05-01 20:35:08,256 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-8f888d9273347dcb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 20:35:08,311 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 20:35:09,803 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:0	Num Templates:10	Num Examples with Template:606
2024-05-01 20:35:15,124 - root - [INFO] - 	!!!Scores: {'accuracy': 0.645, 'average': 0.645}
2024-05-01 20:35:15,125 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 20:35:15,125 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_1/evaluation_runs.json
2024-05-01 20:35:15,125 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:35:15,133 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 20:35:16,627 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:1	Num Templates:10	Num Examples with Template:606
2024-05-01 20:35:21,806 - root - [INFO] - 	!!!Scores: {'accuracy': 0.65, 'average': 0.65}
2024-05-01 20:35:21,806 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 20:35:21,806 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_2/evaluation_runs.json
2024-05-01 20:35:21,806 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:35:21,814 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 20:35:23,312 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:2	Num Templates:10	Num Examples with Template:606
2024-05-01 20:35:29,018 - root - [INFO] - 	!!!Scores: {'accuracy': 0.64, 'average': 0.64}
2024-05-01 20:35:29,018 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 20:35:29,018 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_3/evaluation_runs.json
2024-05-01 20:35:29,019 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:35:29,027 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 20:35:30,520 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:3	Num Templates:10	Num Examples with Template:606
2024-05-01 20:35:36,346 - root - [INFO] - 	!!!Scores: {'accuracy': 0.54, 'average': 0.54}
2024-05-01 20:35:36,346 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 20:35:36,346 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_4/evaluation_runs.json
2024-05-01 20:35:36,346 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:35:36,353 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 20:35:37,834 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:4	Num Templates:10	Num Examples with Template:606
2024-05-01 20:35:43,242 - root - [INFO] - 	!!!Scores: {'accuracy': 0.541, 'average': 0.541}
2024-05-01 20:35:43,242 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 20:35:43,242 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_5/evaluation_runs.json
2024-05-01 20:35:43,242 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:35:43,250 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 20:35:44,745 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:5	Num Templates:10	Num Examples with Template:606
2024-05-01 20:35:50,586 - root - [INFO] - 	!!!Scores: {'accuracy': 0.649, 'average': 0.649}
2024-05-01 20:35:50,586 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 20:35:50,586 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_6/evaluation_runs.json
2024-05-01 20:35:50,586 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:35:50,594 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 20:35:52,085 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:6	Num Templates:10	Num Examples with Template:606
2024-05-01 20:35:57,496 - root - [INFO] - 	!!!Scores: {'accuracy': 0.662, 'average': 0.662}
2024-05-01 20:35:57,496 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 20:35:57,496 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_7/evaluation_runs.json
2024-05-01 20:35:57,496 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:35:57,504 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 20:35:58,983 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:7	Num Templates:10	Num Examples with Template:606
2024-05-01 20:36:04,544 - root - [INFO] - 	!!!Scores: {'accuracy': 0.517, 'average': 0.517}
2024-05-01 20:36:04,544 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 20:36:04,544 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_8/evaluation_runs.json
2024-05-01 20:36:04,544 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:36:04,552 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 20:36:06,048 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:8	Num Templates:10	Num Examples with Template:606
2024-05-01 20:36:12,126 - root - [INFO] - 	!!!Scores: {'accuracy': 0.624, 'average': 0.624}
2024-05-01 20:36:12,126 - root - [INFO] - 	Evaluating model on wic dataset
2024-05-01 20:36:12,126 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wic_template_9/evaluation_runs.json
2024-05-01 20:36:12,126 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:36:12,134 - root - [INFO] - 	Dataset:WIC	Split:test	Selected Examples: 606	Num Total Example:606
2024-05-01 20:36:13,612 - root - [INFO] - 	Dataset:WIC	Split:test	Num Selected Example with Templates:606	Template Idx:9	Num Templates:10	Num Examples with Template:606
2024-05-01 20:36:18,339 - root - [INFO] - 	!!!Scores: {'accuracy': 0.647, 'average': 0.647}
2024-05-01 20:36:18,339 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 20:36:18,339 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_0/evaluation_runs.json
2024-05-01 20:36:18,340 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:36:18,347 - root - [INFO] - 		Loading Full Data for wsc
2024-05-01 20:36:19,038 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-4bcfdb1f33f6e278/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 20:36:19,056 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 20:36:19,245 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:0	Num Templates:10	Num Examples with Template:72
2024-05-01 20:36:21,034 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 20:36:21,034 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 20:36:21,034 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_1/evaluation_runs.json
2024-05-01 20:36:21,034 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:36:21,042 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 20:36:21,214 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:1	Num Templates:10	Num Examples with Template:72
2024-05-01 20:36:22,974 - root - [INFO] - 	!!!Scores: {'accuracy': 0.556, 'average': 0.556}
2024-05-01 20:36:22,974 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 20:36:22,974 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_2/evaluation_runs.json
2024-05-01 20:36:22,974 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:36:22,981 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 20:36:23,198 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:2	Num Templates:10	Num Examples with Template:72
2024-05-01 20:36:25,122 - root - [INFO] - 	!!!Scores: {'accuracy': 0.681, 'average': 0.681}
2024-05-01 20:36:25,122 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 20:36:25,122 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_3/evaluation_runs.json
2024-05-01 20:36:25,123 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:36:25,130 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 20:36:25,345 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:3	Num Templates:10	Num Examples with Template:72
2024-05-01 20:36:27,273 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 20:36:27,273 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 20:36:27,273 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_4/evaluation_runs.json
2024-05-01 20:36:27,273 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:36:27,280 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 20:36:27,462 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:4	Num Templates:10	Num Examples with Template:72
2024-05-01 20:36:29,210 - root - [INFO] - 	!!!Scores: {'accuracy': 0.556, 'average': 0.556}
2024-05-01 20:36:29,210 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 20:36:29,210 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_5/evaluation_runs.json
2024-05-01 20:36:29,210 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:36:29,218 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 20:36:29,392 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:5	Num Templates:10	Num Examples with Template:72
2024-05-01 20:36:31,191 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 20:36:31,191 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 20:36:31,191 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_6/evaluation_runs.json
2024-05-01 20:36:31,191 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:36:31,198 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 20:36:31,371 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:6	Num Templates:10	Num Examples with Template:72
2024-05-01 20:36:33,179 - root - [INFO] - 	!!!Scores: {'accuracy': 0.625, 'average': 0.625}
2024-05-01 20:36:33,180 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 20:36:33,180 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_7/evaluation_runs.json
2024-05-01 20:36:33,180 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:36:33,187 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 20:36:33,460 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:7	Num Templates:10	Num Examples with Template:72
2024-05-01 20:36:35,226 - root - [INFO] - 	!!!Scores: {'accuracy': 0.5, 'average': 0.5}
2024-05-01 20:36:35,226 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 20:36:35,226 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_8/evaluation_runs.json
2024-05-01 20:36:35,226 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:36:35,234 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 20:36:35,407 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:8	Num Templates:10	Num Examples with Template:72
2024-05-01 20:36:37,199 - root - [INFO] - 	!!!Scores: {'accuracy': 0.639, 'average': 0.639}
2024-05-01 20:36:37,199 - root - [INFO] - 	Evaluating model on wsc dataset
2024-05-01 20:36:37,199 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/wsc_template_9/evaluation_runs.json
2024-05-01 20:36:37,199 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:36:37,206 - root - [INFO] - 	Dataset:WSC	Split:test	Selected Examples: 72	Num Total Example:72
2024-05-01 20:36:37,496 - root - [INFO] - 	Dataset:WSC	Split:test	Num Selected Example with Templates:72	Template Idx:9	Num Templates:10	Num Examples with Template:72
2024-05-01 20:36:39,254 - root - [INFO] - 	!!!Scores: {'accuracy': 0.653, 'average': 0.653}
2024-05-01 20:36:39,254 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 20:36:39,254 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_0/evaluation_runs.json
2024-05-01 20:36:39,254 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:36:39,261 - root - [INFO] - 		Loading Full Data for copa
2024-05-01 20:36:39,960 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-75ee279101665e7b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 20:36:39,974 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 20:36:40,190 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:0	Num Templates:8	Num Examples with Template:68
2024-05-01 20:36:41,775 - root - [INFO] - 	!!!Scores: {'accuracy': 0.882, 'average': 0.882}
2024-05-01 20:36:41,775 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 20:36:41,775 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_1/evaluation_runs.json
2024-05-01 20:36:41,775 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:36:41,782 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 20:36:41,990 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:1	Num Templates:8	Num Examples with Template:68
2024-05-01 20:36:43,608 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 20:36:43,608 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 20:36:43,608 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_2/evaluation_runs.json
2024-05-01 20:36:43,608 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:36:43,615 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 20:36:43,823 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:2	Num Templates:8	Num Examples with Template:68
2024-05-01 20:36:45,419 - root - [INFO] - 	!!!Scores: {'accuracy': 0.912, 'average': 0.912}
2024-05-01 20:36:45,419 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 20:36:45,419 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_3/evaluation_runs.json
2024-05-01 20:36:45,419 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:36:45,426 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 20:36:45,649 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:3	Num Templates:8	Num Examples with Template:68
2024-05-01 20:36:47,182 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 20:36:47,182 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 20:36:47,182 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_4/evaluation_runs.json
2024-05-01 20:36:47,182 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:36:47,190 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 20:36:47,397 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:4	Num Templates:8	Num Examples with Template:68
2024-05-01 20:36:48,996 - root - [INFO] - 	!!!Scores: {'accuracy': 0.897, 'average': 0.897}
2024-05-01 20:36:48,996 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 20:36:48,996 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_5/evaluation_runs.json
2024-05-01 20:36:48,996 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:36:49,003 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 20:36:49,212 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:5	Num Templates:8	Num Examples with Template:68
2024-05-01 20:36:50,818 - root - [INFO] - 	!!!Scores: {'accuracy': 0.853, 'average': 0.853}
2024-05-01 20:36:50,818 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 20:36:50,818 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_6/evaluation_runs.json
2024-05-01 20:36:50,818 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:36:50,825 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 20:36:51,033 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:6	Num Templates:8	Num Examples with Template:68
2024-05-01 20:36:52,595 - root - [INFO] - 	!!!Scores: {'accuracy': 0.838, 'average': 0.838}
2024-05-01 20:36:52,595 - root - [INFO] - 	Evaluating model on copa dataset
2024-05-01 20:36:52,595 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/copa_template_7/evaluation_runs.json
2024-05-01 20:36:52,595 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:36:52,602 - root - [INFO] - 	Dataset:COPA	Split:test	Selected Examples: 68	Num Total Example:68
2024-05-01 20:36:52,810 - root - [INFO] - 	Dataset:COPA	Split:test	Num Selected Example with Templates:68	Template Idx:7	Num Templates:8	Num Examples with Template:68
2024-05-01 20:36:54,361 - root - [INFO] - 	!!!Scores: {'accuracy': 0.882, 'average': 0.882}
2024-05-01 20:36:54,361 - root - [INFO] - 	Evaluating model on h-swag dataset
2024-05-01 20:36:54,361 - root - [INFO] - Found cached runs exp_out/merging/ia3-bigscience-T0_3B/tpa-es10_ia3_base_test/rte,cb,winogrande,wic,wsc,copa,h-swag,story_cloze,anli-r1,anli-r2,anli-r3/lambda_searched/predictions/multiple_prompts/test/h-swag_template_0/evaluation_runs.json
2024-05-01 20:36:54,361 - root - [INFO] - did_run_finish not matching but not required to match 
2024-05-01 20:36:54,375 - root - [INFO] - 		Loading Full Data for h-swag
2024-05-01 20:36:55,073 - datasets.builder - [WARNING] - Found cached dataset json (/home/guodong/.cache/huggingface/datasets/json/default-eac19eb96b26d7b6/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
2024-05-01 20:36:55,762 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Selected Examples: 10010	Num Total Example:10010
2024-05-01 20:37:17,769 - root - [INFO] - 	Dataset:H-SWAG	Split:test	Num Selected Example with Templates:10010	Template Idx:0	Num Templates:1	Num Examples with Template:10010
